Google’s John Mueller answered a question in a Google Office Hours Hangout about a Search Console, but where URLs were listed as excluded, but when the URL is examined, the web page was listed as indexed.
Google’s John Mueller said he has seen reports of this irregularity and that he had an idea of what it could be.
Why was the page crawled but not indexed?
One person asked a question about a problem where Google reports that pages are not indexed, but when examined, another report says they are indexed.
This issue makes it difficult for the person to track accurate crawling and indexing statistics for the site.
The person who asked the question explained the problem:
“We like a very large number of crawled, non-indexed pages listed under Excluded.
But then when we click on them, most of these appear to be converted to indexed pages.
So we really are not able to track exactly how improvements on our site affect which pages are indexed.
And I was curious, I guess on the timeline of it.
We are concerned that this will affect our review budget. ”
Continue reading below
Effect on review budget
The person asking the question was concerned that the crawl, but not index error, was causing a problem with their review budget.
A crawl budget is the number of URLs that Google assigns to crawl a site.
The crawl budget is calculated in part by the server’s ability to operate pages. This is called the Crawl Capacity Limit.
If a server has difficulty operating pages, Google may limit how much it crawls so as not to affect the server’s ability to operate pages.
However, if a server responds quickly and easily to GoogleBot’s multiple page request, Google may decide to increase the crawl budget and crawl multiple pages.
The crawl budget is also affected by how often a site is updated.
A site that is rarely updated can be crawled less frequently than a site that is constantly updated.
What was going on, as the person later revealed, was that the site has hundreds of thousands of pages.
Continue reading below
But Google indexed only about 2,000 per. Today, which means that quite a few pages were not crawled at all.
The underlying concern that has not yet been raised was really about why the other pages are not being indexed and if this crawled non-index issue had anything to do with the crawl issue.
But this question had not yet been asked.
So John Mueller only answered at this point the question that was asked of him, which was about the crawled but not indexed issue, and if it had an impact on Google’s review budget.
John Mueller addressed the crawl budget issue:
“I doubt it would affect your crawling budget … as a side note.”
Google crawled – not currently indexed
Google’s Mueller then answered why Google might have shown that a page was crawled but not indexed, but was actually indexed.
“This is something where I’ve recently seen some threads like this also on Twitter, where people saw URLs that were marked as not indexed in the Search Console.
And when you check them individually, they are actually indexed.
I do not know exactly what is happening there yet.
My suspicion is that it’s more a matter of timing because we show them in the Search Console report and then they get indexed over time.
… So at some point they would fall out of the report again.
And for some reason it takes a little longer than it falls out than it should.
That’s a bit of my guess there. ”
Confirm index coverage issues
Mueller then suggested a way to check if what was reported in the Google Search Console was a real index coverage issue, or if it was just a delayed reporting.
John Mueller suggested:
“One way to confirm this is to see if these pages actually show up for normal searches.
So take some words from the page, search for it.
And if they show up, I think there’s nothing you really need to do.
It’s just a report that is a little behind. ”
Related: How to get Google to index your site with the coverage report
Delay in index coverage reporting
There appears to be a delay in the indexing report. It is hoped that the delay is something Google may look at in the near future as it presents a bad user experience for providing false information.
Read Google’s developer page on the GoogleBot Review Budget page:
Continue reading below
Large page ownership to manage your review budget
See John Mueller answer the question about the Google Search Console Indexing Report that lies behind.
See it at the 22:43 minute mark: