Are exchanged or reciprocal links okay with Google?
Etmagnis dis parturient montes, nascetur ridiculus mus. Donec lorem ipsum dolor sit amet, et consectetuer adipiscing elit. Aenean commodo ligula eget consyect etur dolor.

Contact Info


121 King St, Melbourne VIC 3000, Australia

Folow us on social

How to automate search intent grouping of keywords

How to automate search intent grouping of keywords

There is a lot to know about search intent, from using deep learning to deriving search intent by classifying text and breaking down SERP titles using Natural Language Processing (NLP) techniques for clustering based on semantic relevance with the benefits explained.

Not only do we know the benefits of deciphering search intent – we also have a range of techniques available for scaling and automation.

But often these involve building your own AI. What if you do not have the time or knowledge for it?

In this column you will learn a step-by-step process for automating keyword clusters by search intent using Python.


Continue reading below

SERPs provide search insights

Some methods require you to retrieve all the copy from the titles of the ranking content for a given keyword and then feed it to a neural network model (which you then need to create and test), or you may use NLP to gather keywords.

There is another method that allows you to use Google’s very own AI to do the work for you without having to scrape all the SERP content and build an AI model.

Let’s assume that Google ranks URLs according to the likelihood that the content will satisfy the user query in descending order. It follows that if the intent of two keywords is the same, the SERPs are likely to be the same.


Continue reading below

For years, many SEO professionals compared keyword SERP results to derive shared (or shared) search intent to stay on top of Core Updates, so this is not new.

The added value here is the automation and scaling of this comparison, which offers both speed and greater precision.

How to group keywords by search intensity in scale using Python (with code)

Start with your SERP results in a CSV download.

1. Import the list into your Python notebook.

import pandas as pd import numpy as np serps_input = pd.read_csv (‘data / sej_serps_input.csv’) share serps_input[‘Unnamed: 0’]

Below, the SERPs file is now imported into a Pandas data framework.

2. Filter data to page 1

We want to compare Page 1 results for each SERP between keywords.

We divide the data frame into mini-keyword data frames to run the filtering function before we are recombined into a single data frame because we want to filter at the keyword level:

# Divide serps_grpby_keyword = serps_input.groupby (“keywords”) k_urls = 15 # Apply Combine def filter_k_urls (group_df): filtered_df = group_df.loc[group_df[‘url’].notnull ()]filtered_df = filtered_df.loc[filtered_df[‘rank’] <= k_urls]return filtered_df filtered_serps = serps_grpby_keyword.apply (filter_k_urls) # Combine ## Add prefix to column names #normed = normed.add_prefix ('normed_') # Link to initial data frame filtered_serps_df = pd.concat ([filtered_serps], axis = 0) part filtered_serps_df['keyword'] filtered_serps_df = filtered_serps_df.reset_index () share filtered_serps_df['level_1'] filtered_serps_df

3. Convert ranking URLs to a string

Since there are more SERP result URLs than keywords, we need to compress these URLs into a single line to represent the keyword’s SERP.

Here’s how you do it:

# convert results to strings using Split Apply Combine filtserps_grpby_keyword = filtered_serps_df.groupby (“keyword”) def string_serps (df): df[‘serp_string’] = ”. join (df[‘url’]) return df # Combine strung_serps = filtserps_grpby_keyword.apply (string_serps) # Link to initial data frame and pure strung_serps = pd.concat ([strung_serps], axis = 0) strung_serps = strung_serps[[‘keyword’, ‘serp_string’]]# .head (30) strung_serps = strung_serps.drop_duplicates () strung_serps

Below, the SERP is compressed into a single line for each keyword.

4. Compare SERP similarity

To perform the comparison, we now need any combination of keywords SERP paired with other pairs:


Continue reading below

# yesterday serps def serps_align (k, df): prime_df = df.loc[df.keyword == k]
prime_df = prime_df.rename (column = {“serp_string”: “serp_string_a”, ‘keyword’: ‘keyword_a’}) comp_df = df.loc[df.keyword != k].reset_index (drop = True) prime_df = prime_df.loc[prime_df.index.repeat(len(comp_df.index))].reset_index (drop = True) prime_df = pd.concat ([prime_df, comp_df], axis = 1) prime_df = prime_df.rename (columns = {“serp_string”: “serp_string_b”, ‘keyword’: ‘keyword_b’, “serp_string_a”: “serp_string”, ‘keyword_a’: ‘keyword’}) returns prime_df- columns = [‘keyword’, ‘serp_string’, ‘keyword_b’, ‘serp_string_b’]
matched_serps = pd.DataFrame (columns = columns) matched_serps = matched_serps.fill (0) queries = strung_serps.keyword.to_list () for qi queries: temp_df = serps_align (q, strung_serps) matched_serps = matched_serps.append)

The above shows all the keyword SERP pair combinations, making it ready for SERP string comparison.

There is no open source library that compares list items in order, so the feature is written for you below.


Continue reading below

The ‘serp_compare’ function compares the overlap of sites and the order of those sites between SERPs.

import py_stringmatching as sm ws_tok = sm.WhitespaceTokenizer () # Compare only the top k_urls results def serps_similarity (serps_str1, serps_str2, k = 15): denom = k + 1 norm = sum ([2*(1/i – 1.0/(denom)) for i in range(1, denom)]) ws_tok = sm.WhitespaceTokenizer () serps_1 = ws_tok.tokenize (serps_str1)[:k]
serps_2 = ws_tok.tokenize (serps_str2)[:k]

match = lambda a, b: [b.index(x)+1 if x in b else None for x in a]

pos_intersections = [(i+1,j) for i,j in enumerate(match(serps_1, serps_2)) if j is not None]
pos_in1_not_in2 = [i+1 for i,j in enumerate(match(serps_1, serps_2)) if j is None]
pos_in2_not_in1 = [i+1 for i,j in enumerate(match(serps_2, serps_1)) if j is None]
a_sum = sum ([abs(1/i -1/j) for i,j in pos_intersections]) b_sum = sum ([abs(1/i -1/denom) for i in pos_in1_not_in2]) c_sum = sum ([abs(1/i -1/denom) for i in pos_in2_not_in1]) intention_prime = a_sum + b_sum + c_sum intention_dist = 1 – (intention_prime / norm) return intention_dist # Use the matched_serps function[‘si_simi’] = matched_serps.apply (lambda x: serps_similarity (x.serp_string, x.serp_string_b), axis = 1) matched_serps[[“keyword”, “keyword_b”, “si_simi”]]

Now that the comparisons are done, we can start grouping keywords.


Continue reading below

We process keywords that have a weighted similarity of 40% or more.

# group keywords by search intent simi_lim = 0.4 # join search volume keysv_df = serps_input[[‘keyword’, ‘search_volume’]].drop_duplicates () keysv_df.head () # append topic vols keywords_crossed_vols = serps_compared.merge (keysv_df, on = ‘keyword’, how = ‘left’) keywords_crossed_vols = keywords_crossed_vols.rename (columns = {‘keyword’: ‘topic’ , ‘keyword_b’: ‘keyword’, ‘search_volume’: ‘topic_volume’}) # sim si_simi keywords_crossed_vols.sort_values ​​(‘topic_volume’, ascending = False) # strip NANs keywords_filtered_nonnan = keywords_crossed_vols.dropna () keywords_filtered_nonnan

We now have the potential topic name, keyword SERP similarity and search volumes for each.

You notice that keywords and keywords_b have been renamed to subject and keywords, respectively.


Continue reading below

Now we need to repeat the columns in the data frame using the lamdas technique.

The Lamdas technique is an efficient way of repeating rows in a Pandas data frame because it converts rows to a list as opposed to the .iterrows () function.

Here goes:

queries_in_df = list (set (keywords_filtered_nonnan.topic.to_list ())) topic_groups_numbered = {} topics_added = []

def find_topics (si, keyw, topc): i = recent_index (topic_groups_numbered) if (si> = simi_lim) and (not keyw in topics_added) and (not topc in topics_added): i + = 1 topics_added.append (keyw) topics_added. add (topc) topic_groups_numbered[i] = [keyw, topc]
elif si> = simi_lim and (keyw in topics_added) and (not topc in topics_added): j = [key for key, value in topic_groups_numbered.items() if keyw in value]
topics_added.append (topc) topic_groups_numbered[j[0]].append (topc) elif si> = simi_lim and (not keyw in topics_added) and (topc in topics_added): j = [key for key, value in topic_groups_numbered.items() if topc in value]
topics_added.append (keyw) topic_groups_numbered[j[0]].append (keyw) def apply_impl_ft (df): return df.apply (lambda row: find_topics (row.si_simi, row.keyword, row.topic), axis = 1) apply_impl_ft (keywords_filtered_nonnan) topic_groups_numbered = {k: list ( set (v)) for k, vi topic_groups_numbered.items ()} topic_groups_numbered

Below is a dictionary that contains all the keywords grouped by search intent into numbered groups:

{1: [‘fixed rate isa’,
‘isa rates’,
‘isa interest rates’,
‘best isa rates’,
‘cash isa’,
‘cash isa rates’], 2: [‘child savings account’, ‘kids savings account’], 3: [‘savings account’,
‘savings account interest rate’,
‘savings rates’,
‘fixed rate savings’,
‘easy access savings’,
‘fixed rate bonds’,
‘online savings account’,
‘easy access savings account’,
‘savings accounts uk’], 4: [‘isa account’, ‘isa’, ‘isa savings’]}

Let’s keep it in a data frame:

topic_groups_lst = []

for k, li topic_groups_numbered.items (): for vil: topic_groups_lst.append ([k, v]) topic_groups_dictdf = pd.DataFrame (topic_groups_lst, columns =[‘topic_group_no’, ‘keyword’]) topic_groups_dictdf

The search groups above show a good approximation of the keywords within them, something that an SEO expert would probably achieve.


Continue reading below

Although we only used a small set of keywords, the method can of course be scaled to thousands (if not more).

Enabling output to make your search better

Of course, the above could be taken further by using neural networks that process the rank content into more accurate clusters and naming cluster groups, as some of the commercial products out there already do.

For now, with this output you can:

Integrate this into your own SEO dashboard systems to make your trends and SEO reporting more meaningful. Build better paid search campaigns by structuring your Google Ads accounts according to search intent to get a higher Quality Score. Look up redundant facet commerce URLs. Structure a shopping website taxonomy according to search intent instead of a typical product catalog.


Continue reading below

I’m sure there are several applications that I have not mentioned – feel free to comment on important ones that I have not already mentioned.

In any case, your SEO keyword research became just as little more scalable, accurate and faster!

More resources:

Image credits

Featured Image: Astibuag /
All screenshots taken by the author, July 2021


Continue reading below

    Leave Your Comment

    Your email address will not be published.*