Are exchanged or reciprocal links okay with Google?
Etmagnis dis parturient montes, nascetur ridiculus mus. Donec lorem ipsum dolor sit amet, et consectetuer adipiscing elit. Aenean commodo ligula eget consyect etur dolor.

Contact Info


121 King St, Melbourne VIC 3000, Australia

Folow us on social

How to use machine learning for SEO competitor research

How to use machine learning for SEO competitor research

With the ever-increasing appetite of SEO professionals to learn Python, there has never been a better or more exciting time to take advantage of machine learning (ML) capabilities and apply these to SEO.

This is especially true in your competitor research.

In this column, you will learn how machine learning helps solve common challenges in SEO competitor research, how to set up and train your ML model, how to automate your analysis and more.

Let’s do it!

Why we need machine learning in SEO competitor research

Most, if not all SEO professionals working in competitive markets analyze the SERPs and their business competitors to find out what it is their site is doing to achieve a higher rank.

Back in 2003, we used spreadsheets to collect data from SERPs with columns representing different aspects of the competition, such as the number of links to the website, number of pages, and so on.

Eventually, the idea was right, but the execution was hopeless due to the limitations of Excel to perform a statistically robust analysis in a short time.


Continue reading below

And if the boundaries of spreadsheets were not enough, the landscape has moved quite a bit since then, as we now have:

Mobile SERPs.Social media.A much more sophisticated Google search.Page speed.Personal search.Schema.Javascript frameworks and other new web technologies.

The above is by no means an exhaustive list of trends, but serves to illustrate the ever-increasing range of factors that may explain the advantage of your higher-ranked competitors in Google.

Machine learning in the SEO context

Fortunately, with tools like Python / R, we are no longer subject to spreadsheet limits. Python / R can handle millions to billions of data rows.

If anything, the limit is the quality of the data you can add to your ML model and the intelligent questions you ask to your data.

As an SEO professional, you can make the crucial difference to your SEO campaign by cutting through noise and applying machine learning to competitor data to discover:


Continue reading below

Which ranking factors can best explain the differences in rankings between sites. What is the winning benchmark. How much a unit change in factor is worth in terms of rank.

Like any (data) scientific effort, there are a number of questions that need to be answered before we can begin coding.

What type of ML problem is competitor analysis?

ML solves a number of problems, whether it is categorizing things (classification) or predicting a continuous number (regression).

In our particular case, since the quality of a competitor’s SEO is denoted by its rank in Google, and this rank is a continuous number, then the ML problem is regression.

Result Metric

Given that we know that the ML problem is a regression, the result metric is rank. This makes sense for a number of reasons:

Rank does not suffer from seasonality; an ice cream brand’s rankings for searches on [ice cream] is not depreciated because it is winter, unlike the “users” metric. Competitor ranking is third-party data and is available using commercial SEO tools as opposed to their user traffic and conversions.

What are the features?

Once we know the result, we must now determine the independent variables or model inputs, also known as functions. The data types for the function will vary, for example:

First paint measured in a few seconds would be a number. Feeling with the categories positive, neutral and negative would be a factor.

Of course, you will cover as many meaningful features as possible, including technical, content / UX and offsite for the most comprehensive competitor research.

What is math?

Considering that rankings are numerical and that we will explain the difference in rank, then in mathematical terms:

rang ~ w_1 * feature_1 + w_2 * feature_2 +… + w_n * feature_n

~ (known as “tilde”) means “explained with”

n is the ninth function

w is the weighting of the function

Using machine learning to uncover the competitor’s secrets

With the answers to these questions in hand, we’re ready to see what secrets machine learning can reveal about your competition.

At this point, we assume that your data (known in this example as “serps_data”) has been collected, transformed, cleaned, and is now ready for modeling.


Continue reading below

This data contains at least the Google rank and performance data you want to test.

For example, your columns could include:

Google_rank.Page_speed.Sentiment.Flesch_kincaid_reading_ease.Amp_version_available.Site_depth.Internal_page_rank.Referring_comains count.avg_domain_authority_backlinks.title_keyword_string_distance.

Training your ML model

To train your model, we use XGBoost because it tends to deliver better results than other ML models.

Alternatives you may want to try in parallel are LightGBM (especially for much larger datasets), RandomForest and Adaboost.

Try using the following XGBoost Python code for your SERP dataset:

# import the libraries

import xgboost as xgb import pandas as pd serps_data = pd.read_csv (‘serps_data.csv’)

# set the model variables

# your SERP data with everything except the google_rank column

serp_features = serps_data.drop (columns = [‘Google_rank’])

# your SERP data with only the google_rank column

rank_actual = serps_data.Google_rank

# Instantier models

serps_model = xgb.XGBRegressor (objective = “reg: linear”, random_ state = 1231)

# fits the model (serp_features, rank_actual)

# generate model predictions

rank_pred = serps_model.predict (serp_features)

# assess model accuracy

mse = middle_squared_error (rank_actual, rank_pred)

Note that the above is very basic. In a true client scenario, you will test a number of model algorithms on a training data sample (approximately 80% of the data), evaluate (using the remaining 20% ​​data) and select the best model.


Continue reading below

So what secrets can this machine learning model tell us?

The most predictable drivers of rank

The chart shows the most influential SERP features or ranking factors in descending order of importance.

In this particular case, the most important factor was “title_keyword_dist”, which measures the string spacing between the title tag and the target keyword. Think of this as the title tag’s relevance to the keyword.


Continue reading below

No surprise there for the SEO practitioner, but the value here is to provide empirical evidence to the non-expert business target audience that does not understand the need to optimize title tags.

Other factors in this industry are:

no_cookies: The number of cookies.dom_ready_time_ms: A measure of page speed.no_template_words: Counts the number of words outside the content section of the main text. link_root_domains_links: Number of links to root domains. No. to reproduce.

Every market or industry is different, so the above is not a general result for the whole SEO!

How much rank a rank factor is worth

In another market case, we can also see how much rank will be delivered.

In the chart above, we have a list of factors and rank change for each positive unit change in this factor.


Continue reading below

For example, for each unit, the increase in meta description length by 1 character is a corresponding decrease in Google rank of 0.1.

Taken out of context, this sounds ridiculous. However, since most meta descriptions are populated, this would mean that a unit change away from the average meta description length would then lead to a decrease in Google search.

The winning benchmark for a placement factor

Below is a graph showing the average title layer length for an industry other than the one above, which also includes a line that fits best:

Despite the best practice SEO recommendation of using up to 70 characters for title layer length, the above data shows the actual optimal length in this industry to be 60 characters.


Continue reading below

Thanks to machine learning, we are not only able to surface the most important factors, but when we take a deep dive, we can also see the winning benchmark.

Automate your SEO competitor analysis with machine learning

The above application of machine learning is great for getting some ideas to split AB test and improve the SEO program with evidence-driven change requests.

It is also important to recognize that this analysis is made even more powerful once it is underway.


Because the ML analysis is only a snapshot of the SERPs in a single time.

Having a continuous stream of data collection and analysis means you get a more true picture of what is really happening with the SERPs in your industry.

This is where SEO custom-built data storage and dashboard systems come in handy, and these products are available today.

What these systems do is:

Insert your data from your favorite SEO tools daily. Combine the data. Use ML to gain superficial insight as above in a frontend of your choice like Google Data Studio.


Continue reading below

To build your own automated system, you need to implement in a cloud infrastructure such as Amazon Web Services (AWS) or Google Cloud Platform (GCP), what is called ETL, ie. extract, transform and load.

To explain:

Excerpt – Daily call of your SEO tool APIs.Transform – Cleansing and analysis of your data using ML as described above.Last – Depositing the finished result in your data warehouse.

Your data collection, analysis and visualization are thus automated in one place.


Competitor research and analysis in SEO is difficult because there are so many placement factors to check for.

Spreadsheet tools are not up to it because of the amounts of data involved (let alone the statistical possibilities that computer science languages ​​like Python offer).

When performing SEO competitor analysis using machine learning, it is important to understand that this is a regression problem, the target variable is Google rank, and that the hypotheses are the ranking factors.

Using ML on your competitors can tell you what key drivers are, identify winning benchmarks among them, and inform how much boost in rank your optimizations can potentially deliver.


Continue reading below

The analysis is only a snapshot, so to stay on top of the competition you need to automate this process using Extract, Transform, Load (ETL).

More resources:

Image credits

All screenshots taken by the author, June 2021

    Leave Your Comment

    Your email address will not be published.*