SEO: Understanding Crawl Budget and Indexing
An article published in January on the official blog of Google has put a spotlight on the notion of “crawl budget”. If this content does not reveal in itself revolutionary information, it has the merit of presenting a fundamental optimization axis in any SEO strategy.
How does the crawl work?
Crawl is essential to the good referencing of your site.
Crawl robots on the pages of a website
Search engines rely on robots (also called bots, spiders, crawlers …) that browse the web from site to site and from link to link and record the contents of web pages in the index of the engine.
This phase of content discovery is called crawl. On the SEO side, the optimization of the indexing of the contents is a fundamental action.
What do robots do on your site?
The role of a robot is to crawl every website. Googlebots will browse your site to analyze all available data. Note that the homepage is often the most crawled by robots; it will therefore facilitate the discovery of the pages that are directly related to it. Googlebot can also follow external links to discover the contents of the site. The homepage is therefore not the only entry for robots.
Robots will not only list the visible elements of your pages like meta tags or images for example. They will be interested in information invisible to the naked eye in order to prioritize the content and understand the architecture of your website while analyzing the links between your pages.
Robots scan each page and host a copy on Google’s servers. In order to keep these copies up to date, robots spend hundreds of times on your site to take into account the adjustments of your content, depending on the budget crawl.
The stakes of the crawl budget
The crawl budget corresponds to the time allocated by Google to explore your site.
Even if Google relies on an army of robots that does not work 35 hours, these robots have only a limited time to crawl a site.
One of the strong stakes of a natural SEO action will be to increase the crawl budget. The higher your crawl budget, the more quickly your new content will be taken into account and the more likely you are to go back in the first results (even if there is no causal link between crawl budget and positioning.) A crawl budget increases the compliance with technical criteria and the regular addition of new content, the two pillars of an SEO action).
The idea is therefore to have a technical and editorial structure that facilitates the crawl work of robots:
- Optimize crawl speed, including page load times. Indeed, the duration of loading of the contents and the number of pages analyzed are much related. If this delay increases, the number of crawled pages decreases, and vice versa.
- Increase the frequency of updating your pages. The more you create and adjust your content, the more you make robots come to your site.
- Improve the structure of your site and its architecture. A simple tree structure and an optimized internal mesh will facilitate the navigation of robots that will go faster.
In summary, the easier you enable crawl robots, the better they will return you by returning more often to your site and improving your positioning!
How to analyze the crawl?
Analyze crawl on Google Search Console
Google Search Console gives you a brief overview of the perception that Google has of your site. You can visit the Google Index or Exploration sections to learn about crawl errors or 404 errors on your site.
On the other hand, it is difficult to conduct a complete analysis of the effectiveness of your site from the Google Search Console. Google provides a global report but offers no micro analysis of these statistics to ensure that the most strategic content is well crawled. This is why it is essential to rely on other sources of information.
The need to analyze logs for more performance
The log file is a log of all the events that may have affected the computer system of your site. All the accesses of an Internet site are listed. As soon as a visitor visits your pages, the information of his passage is collected in this file.
Often, robots focus too much on non-strategic content or on elements that are problematic for a site publisher. For example, sites that use facetted engines (deep pages, duplicate or very similar content, very large amount of pages …) run counter to the crawl budget optimization because robots will potentially browse thousands of pages duplicated or having low SEO interest.
By analyzing the logs, it is possible to clearly define the corrective actions to be implemented to better control the crawl of your pages.
How to improve the indexation of your webpages?
- Work on the architecture of your site (silo and thematic) and levels of depth of the key contents. For information, N0 corresponds to the homepage. The more strategic content is close to N0, the better, because it is advisable to reduce the number of clicks to make browsing easier for users.
- Improve the internal and external meshing while focusing on the key contents of your site in order to multiply the entry points to each page. Perfect the functionality of your mesh, including paging your site to reduce the depth of content.
- Promote the crawl of high value-added content and eliminate as much as possible pages with low quality (duplicate content, list pages …).
- Think of your editorial strategy and press a regular animation of your site through various contents to feed and develop the interest of the site.
- Maximize the loading time of your site to speed up the work of robots.
- Finally, it is possible to block the tracking of links by robots. The Robots.txt prohibits indexing (Disallow statement) but not the crawl, so you must encrypt it or specify it through the rel = “nofollow” attribute on a link.
Crawl and indexing are therefore two closely related concepts and at the heart of SEO. You can analyze your site yourself with dedicated tools. In order to interpret the data on your site, you can also rely on log analysis tools like SEO Log File Analyzer.