According to Google's Search Relations team, most websites don't have to worry about crawling budget in the latest installment of the Search Off the Record podcast.
Google's Gary Illyes discussed the issue at length, saying the team had pushed back its previous messages, adding that it had an "essential segment" to deal with.
However, crawling budget shouldn't be an issue for most websites, explains Illyes:
"We've cut the crawl budget in the past and usually told people they didn't have to worry about it."
And I assert myself and still say that most people don't have to worry about it. We believe there is an essential part of the ecosystem that needs to take care of it.
… But I still believe that – I'm trying to reinforce this here – the vast majority of people don't need to worry. "
To clarify previous messages, Google recently released more information on the crawl budget.
Read on below
For example, just last month Google dedicated an entire episode of its SEO Mythbusting YouTube series to the topic of crawling budget.
So who should take care of the crawl budget and who shouldn't?
When to worry about the crawl budget / when not to worry about it
SEOs usually want to hear a hard number when it comes to crawling budget – for example, your website must have X number of pages before crawling budget is an issue.
But that's not how it works, says Illyes:
"… well, that's not quite so. It's like you can do stupid things on your website and then googlebot start crawling like crazy.
Or you can do other stupid things and then Googlebot will just stop crawling altogether. "
When Illyes is forced to provide a number, roughly a million URLs is the basis before a website owner really has to worry about the crawling budget.
Read on below
Websites with fewer than a million URLs don't have to worry about the crawl budget.
Factors that affect the crawl budget
For websites with over a million URLs, these are some of the factors that can create or indicate problems with budget crawling.
Factor 1: Pages that have not been crawled for a long time
"What would I see? Probably URLs that were never crawled. This is a good indicator of how well discovered a website is and how well it is crawled.
So I would look at sites that were never crawled. For this you will probably want to look at your server logs as this can give you the absolute truth. "
Factor 2: Widespread changes after long periods of time
“Then I would also look at the refresh rates. For example, if you find that certain parts of the website have been around for a long time, For example, months that haven't been updated and you've made changes to the pages in this section, you probably want to think about your crawl budget. "
Troubleshoot crawl budget issues
Illyes has two suggestions for resolving crawl budget issues.
First, try removing unnecessary pages. Every page that Googlebot has to crawl reduces the crawling budget for other pages.
Excessive "gibberish" content can therefore result in important content not being crawled.
"For example, if you remove content that is generally less useful to users, if you remove it, if you hack or remove it, Googlebot has time to focus on higher quality pages that are actually good for users."
The second suggestion from Illyes is to avoid sending "back-off" signals to Googlebot.
Back-off signals are specific server codes that tell Googlebot to stop crawling a website immediately.
“If you send us return signals, it will affect Googlebot crawling. If your servers can handle this, you'll want to make sure you're not sending us 429 or 50x status codes and your server is quick to respond. "
To learn more about the intricacies of the crawling budget, listen to the podcast episode below.
Read on below