play watch

Ahrefs’ Crawler and Index

Crawl Budget

Video for this tutorial is coming soon.

Fresh index, good coverage, not get blocked. Choose two.
- Dmytro Gerasymenko, Founder & CEO of Ahrefs

When building a big index, there’s always a trade-off between freshness and good coverage.

Freshness implies running regular crawls to keep information up-to-date. Good coverage implies crawling as many pages as possible. Yet, you can’t have both running at full capacity, otherwise you’d get blocked by webmasters and hosting companies.

The answer to this is implementing a crawl budget, which refers to the number of URLs a crawler can and wants to crawl.

Crawl budget is composed of two parts: crawl rate and crawl demand.

Crawl budget-1

Crawl rate

Crawl rate refers to the number of requests a crawler can make to a site when crawling it.

Crawling a website too fast can add too much load to a server. Since this can lead to poor user experience or result in our crawler getting blocked, our crawl rate takes into account:

  • Page speed - Faster-loading pages are preferred to slower-loading ones.
  • Website size - Small websites with high-quality links will most likely be crawled in full as compared to larger websites with low-quality links that might get only partially crawled.

Crawl demand

Crawl demand, or call priority, represents the level of importance attached to crawling and recrawling pages on a website.

This is done by our scheduler, which determines the crawl demand by:

  • URL popularity (URL Rating) - The higher the quality of the backlinks pointing to a page, the higher the priority.
  • Website popularity (Domain Rating) - The higher the strength of a website’s backlink profile, the higher the priority.

Next lesson

Crawler FAQs