Are exchanged or reciprocal links okay with Google?
Etmagnis dis parturient montes, nascetur ridiculus mus. Donec lorem ipsum dolor sit amet, et consectetuer adipiscing elit. Aenean commodo ligula eget consyect etur dolor.

Contact Info

(+888)-123-4587

121 King St, Melbourne VIC 3000, Australia

info@example.com

Folow us on social

How to use Google Sheets for web scraping and campaign building

How to use Google Sheets for web scraping and campaign building

We have all been in a situation where at some point we had to extract data from a website.

When working on a new account or campaign, you have e.g. Data or information may not be available for creating the ads.

In an ideal world, we would have got all the content, landing pages and relevant information we need in an easy to import format, e.g. A CSV, Excel spreadsheet or Google Sheet. (Or at least, provided we need as tabbed data that can be imported into one of the aforementioned formats.)

But this is not always the case.

Those who lack the tools for web scraping – or the coding knowledge to use something like Python to help with the task – may have had to resort to the tedious job of manually copying and pasting possibly hundreds or thousands of records.

In a recent job, my team was asked to:

Advertising

Continue reading below

Go to the customer’s website. Download more than 150 new products on 15 different pages. Copy and paste the product name and landing page URL for each product into a spreadsheet.

Now you can imagine how long the task would have been if we had done just that and manually performed the task.

Not only is it time consuming, but with someone manually reviewing so many items and pages and physically having to copy and paste the data product by product, the chances of making a mistake or two are quite high.

It would then require even more time to review the document and ensure that it was error-free.

There must be a better way.

Good news: There is! Let me show you how we did it.

What is IMPORTXML?

Enter Google Sheets. I would like you to meet the IMPORTXML function.

According to Google’s support page, IMPORTXML “imports data from various structured data types, including XML, HTML, CSV, TSV and RSS and ATOM XML feeds.”

Advertising

Continue reading below

In essence, IMPORTXML is a feature that allows you to scrape structured data from web pages – no coding knowledge is required.

For example, it is quick and easy to extract data such as page titles, descriptions or links, but also more complex information.

How can IMPORTXML help scrape elements of a web page?

The function itself is quite simple and requires only two values:

We intend to extract or scrape the URL of the web page from the information. And XPath for the item where the data is contained.

XPath stands for XML Path Language and can be used to navigate through elements and attributes in an XML document.

For example, to extract the page title from https://en.wikipedia.org/wiki/Moon_landing we would use:

= IMPORTXML (“https://en.wikipedia.org/wiki/Moon_landing”, “// title”)

This will return the value: Moon landing – Wikipedia.

Or if we are looking for the page description, you can try this:

IMPORTXML (“https://www.Marketing-Ideas.org/”, “// meta[@name=’description’]/@contents”)

Here is a shortlist of some of the most common and useful XPath queries:

Page title: // titlePage meta description: // meta[@name=’description’]/ @ contentPage H1: // h1Side links: // @ href

See IMPORTXML in action

Since the discovery of IMPORTXML in Google Sheets, it has truly become one of our secret weapons in the automation of many of our daily tasks, from campaign and ads to content research and more.

In addition, the feature combined with other formulas and add-ons can be used for more advanced tasks that would otherwise require sophisticated solutions and development, such as tools built into Python.

But in this case, we will look at IMPORTXML in its most basic form: to scrape data from a web page.

Let’s look at a practical example.

Imagine that we have been asked to create a Marketing-Ideas campaign.

They would like us to announce the last 30 articles that have been published under the PPC section of the site.

Advertising

Continue reading below

A fairly simple task, one might say.

Unfortunately, the editorial staff is not able to send us the data and has kindly asked us to refer to the website to get the information needed to create the campaign.

As mentioned at the beginning of our article, one way to do this would be to open two browser windows – one with the site and the other with Google Sheets or Excel. We would then start copying and pasting the information, article by article and link after link.

However, using IMPORTXML in Google Sheets, we can achieve the same output with little or no risk of making a mistake in a fraction of the time.

Here’s how you do it.

Step 1: Start with a new Google Sheet

First, we open a new, blank Google Sheets document:

Step 2: Add the content you need to scrape

Add the URL of the page (or pages) we want to scrape the information from.

Advertising

Continue reading below

In our case, we start with https://www.Marketing-Ideas.org/category/pay-per-click/:

Step 3: Find XPath

We find XPath for the item we want to import the contents of into our datasheet.

In our example, let’s start with the titles of the last 30 articles.

Go to Chrome. When hovering over the title of one of the articles, right-click and select Inspector.

This opens the Chrome Dev Tools window:

Make sure the article title is still selected and highlighted, then right-click again and choose Copy> Copy XPath.

Advertising

Continue reading below

Step 4: Extract the data in Google Sheets

Back in your Google Sheets document, introduce the IMPORTXML feature as follows:

= IMPORTXML (B1, ”// *[starts-with(@id, ‘title’)]”)

A few things to note:

First, in our formula, we have replaced the URL of the page with the reference to the cell where the URL is stored (B1).

Second, when you copy XPath from Chrome, this will always be enclosed in double quotes.

(// *[@id=”title_1″])

To ensure that it does not break the formula, the double quotation mark must be changed to the individual quotation mark.

(// *[@id=’title_1’])

Note that in this case, because the page ID title changes for each article (title_1, title_2, etc.), we need to change the query slightly and use “starter-med” to capture all elements on the page with an ID that contains ‘title.’

This is what the Google Sheets document looks like:

And in a few moments, the results look like this after the query has loaded the data into the spreadsheet:

As you can see, the list returns all the articles found on the page that we just scraped (including my previous piece on automation and how I use ad customization to improve Google Ads campaign performance).

Advertising

Continue reading below

You can also use this to scrape all the other information needed to create your ad campaign.

Let’s add the landing page URLs, the selected snippet of each article, and the author’s name to our Sheets document.

For landing page URLs, we need to adjust the query to indicate that we are after the HREF element associated with the article title.

Therefore, our query will look like this:

= IMPORTXML (B1, ”// *[starts-with(@id, ‘title’)]/ @ href ”)

Now add ‘/ @ href’ to the end of Xpath.

Voila! Immediately we have the URLs of the landing pages:

You can do the same for the selected excerpts and author names:

Troubleshooting

One thing to keep in mind is that in order to expand and populate the spreadsheet with all the data returned by the query, the column where the data is populated must have enough cells free and no other data in the way.

Advertising

Continue reading below

This works in the same way as when we use an ARRAY FORMULA, in order for the formula to be extended, there must be no other data in the same column.

Conclusion

And there you have a fully automated, error-free way to scrape data from (potentially) any web page, whether you need content and product descriptions or e-commerce data such as product price or shipping costs.

At a time when information and data can be the benefit required to deliver better-than-average results, the ability to scrape web pages and structured content easily and quickly can be invaluable. In addition, as we have seen above, IMPORTXML can help reduce execution times and reduce the chances of errors.

In addition, the feature is not just a great tool that can be used exclusively for PPC tasks, but can instead be really useful across many different projects that require web scraping, including SEO and content tasks.

More resources:

Advertising

Continue reading below

Image credits

All screenshots taken by the author, August 2021

    Leave Your Comment

    Your email address will not be published.*