In a Google SEO Office Hours hangout, Google’s John Mueller was asked why Google did not crawl enough web pages. The person who asked the question explained that Google was crawling at a pace that was not enough to keep up with a huge site. John Mueller explained why Google might not be crawling enough pages.
What is the Google Crawl budget?
GoogleBot is the name of Google’s crawler that goes to web page to web page that indexes them for ranking.
But because the Internet is large, Google has a strategy of only indexing higher quality web pages and not indexing low quality web pages.
According to Google’s developer site for large sites (in the millions of web pages):
“The time and resources that Google spends on crawling a site are commonly called the site crawling budget.
Note that not everything crawled on your site will necessarily be indexed; each page must be evaluated, consolidated and assessed to determine if it will be indexed after it has been reviewed.
Crawl budget is determined by two main elements: crawl capacity limit and crawl demand. ”
Continue reading below
Related: Google SEO 101: Budget for Website Crawl Explained
What determines the GoogleBot crawl budget?
The person asking the question had a website with hundreds of thousands of pages. But Google only crawled about 2,000 web pages a day, a speed that was too slow for such a large site.
The person who asked the question followed up with the following questions:
“Do you have any other means to gain insight into the current search budget?
Just because I feel like we’re really trying to make improvements, but have not seen a leap in pages a day crawled. ”
Google’s Mueller asked the person how big the site is.
The person who asked the question replied:
“Our site is found on hundreds of thousands of pages.
And we’ve seen about 2,000 pages a day crawled, even though there are a backlog of up to 60,000 discovered but not yet indexed or reviewed pages. ”
Google’s John Mueller replied:
“So in practice, I see two main reasons why this is happening.
On the one hand, if the server is markedly slow, which is … response time, I think you see that in the crawl statistics report as well.
It’s an area where if … as if I were to give you a number, I would say something below 300, 400 milliseconds, something on average.
Because it allows us to crawl pretty much as much as we need.
It is not the same as the page speed.
So that’s … one thing to watch out for. ”
Continue reading below
Related: Crawl budget: Everything you need to know about SEO
Website quality can affect GoogleBot crawl budget
Google’s John Mueller then mentioned the issue of site quality.
Poor page quality may prevent the GoogleBot crawler from crawling a site.
Google’s John Mueller explained:
“The other big reason why we do not crawl a lot from websites is because we are not convinced about the quality in general.
So this is something where I, especially with newer sites, sometimes struggle with it.
And I also see sometimes people say well, it’s technically possible to create a site with a million pages because we have a database and we just put it online.
And just by doing so, essentially from one day to the next, we will find many of these pages, but we will be like, we are not sure of the quality of these pages yet.
And we will be a little more careful about reviewing and indexing them until we are sure the quality is actually good. ”
Factors that affect how many pages Google crawls
There are other factors that can affect how many pages Google crawls that were not mentioned.
For example, a site hosted on a shared server may not be able to deliver pages fast enough to Google because other sites on the server may be using too many resources, slowing down the server for the other thousands of sites on that server .
Another reason may be that the server is slammed by junk bots, which causes the website to slow down.
John Mueller’s advice on noticing the speed at which the server operates web pages is good. Be sure to check it after opening hours at night because many crawlers like Google will crawl in the early morning hours because it is generally a less disruptive time to crawl and there are fewer visitors to the sites at that time.
Read the Google crawler budget page for developer sites:
Large page ownership to manage your review budget
Continue reading below
See Google’s John Mueller answer the question that GoogleBot does not crawl enough web pages.
See it at about 25:46 minute mark: