Is it possible to buy Ahrefs

How Ahrefs links and domains counts

Each backlink tool captures and stores different links.

When creating a web index, companies must make a number of decisions about crawling, parsing, and indexing data. While there is a lot of overlap between the indices, there are also some differences that depend on the decisions made by each company.

For reasons of transparency, we would like to tell you more about the Ahrefs link index.

Links take users from one website to another with a click of the mouse. There are many ways to create links, the most common method being the classic HTML element with an href attribute.

However, it is also possible to create links with other elements, e.g .:

  • Onclick
  • Button
  • Ng-click
  • / Value option
  • And many more …

Which links are indexed?

In an ideal world, anything that acts as a link would be stored. Unfortunately, we don't live in an ideal world. Neither Ahrefs nor Google stores all types of links. Because it would be simply inefficient to load every page and click every single link. But that's exactly what you'd have to do if you wanted to find every working link.

Instead, crawlers typically call up the web pages, render them if necessary, and then extract and then save various types of links. Since every crawler works differently, we'll show you how we do it here at Ahrefs.

Links that we save

We store the following types of links in our index:

External links

Links from one website to another created with the typical HTML element with an href attribute.

Internal links

These are links from one page on a website to another page on the same website. There are 22.21 trillion internal backlinks in our index. That is way more than the number of our external live links. We're the only SEO tool where you can access this data without crawling an individual website. We use the internal link data in the URL Rating (UR) calculation, similar to how Google would use it in its PageRank calculation.

If you want to see when we first and last crawled a URL, you can check out the “Best by links” report in Site Explorer. There are tabs for both external and internal links.

Links that we may save

Below are all the links that we save under certain circumstances:

Links inserted with JavaScript

Since Google renders all pages, the search engine can count links that are inserted with JavaScript but are not in the HTML code. Large-scale rendering takes a lot more resources than just downloading the HTML of pages. At Ahrefs, we render around 80 million pages a day. That's why we have some of these links inserted using JavaScript, but not all of them. We're currently the only SEO tool that renders during our regular web crawling, so we have some link data that other tools don't.

However, we only count links inserted using JavaScript if they are in the form of an HTML element with an href attribute. These links are marked as “JS” in the backlinks report, something like this:

Links from pages with URL parameters

Parameters are additions to a URL such as? Tag = something. You may see some of these urls in our index, but they are usually parameters that indicate different content. In many cases, pages with parameters can show the same content. We have put in place many systems to consolidate URLs into canonical versions and to provide additional protection for infinite crawl paths. Other tools may not make the same decisions or may not have the same protections in place. As a result, they may count the same link multiple times.

Links we're not trying to save

We try not to save the following links if possible:

Links from pages with URL parameters

As mentioned earlier, there are good and bad types of parameters. We try not to save those that are duplicated.

Links from pages in infinite crawl paths

These paths create an infinite number of possible URLs. Parameters are one form they can build, but so are filters, dynamic content, and broken relative paths for links. As mentioned earlier, we have taken a lot of precautions for links on these types of pages to reduce the chances of them showing up in our reports. Compliance with canonicalization and the way we prioritize page crawling are just two of those precautions. Every index has to deal with these infinite paths, but these sites have the ability to drive link counts up.

Links that we do not save

We never save the following links:

Links in PDFs or other documents

Google converts many file formats to HTML and indexes them like any other page. That means that they count the links in these documents. I don't think any SEO tool is currently indexing these links, but we probably should. I think we will one day, but I am equally concerned that it will not be worth the effort and resources required. According to John Mueller, Google Webmaster Trends Analyst, the links in PDFs have no practical effect on web searches.

Links in iframes

Iframes allow another page to be displayed within a page. Because of this, we don't count the links in iframes. However, they are displayed to users so that other tools may count them, even though the content technically belongs to a different page. Google may or may not count these links.

Links from non-indexed pages

We leave out these links. There are different statements from Google about whether they take this into account or not. Other tools can make different decisions in this regard.

something with noindex will never reach the serving index, but we will have the fetched copy for things like link graph calculation.— Gary 鯨 理 / 경리 Illyes (@ethod) December 17, 2020

Identical links from multiple IPs

A fun fact is that websites can serve the same page from multiple IP addresses. If this is the case, it can happen that a link index counts the same link multiple times. We don't do that. We associate the links with the pages on which they are located.

Multiple links pointing to the same page from a single page

We currently only track one version of a link on a page. If you put a link to a page in the website menu and then again in the body of the website, we will only count one of these links. We may change this in the future in order to make more data available to users, but that's the current status. Google counts all versions of links for the determination of the PageRank, but possibly only considers the anchor text of one version.

Other link-related aspects that affect the index

Understanding how we count links is one thing. But there are also many other factors that influence what is and what is not counted.

Number of links per page

I don't think we have a limit on the number of links that are counted per page. Still, we have a page size limit that may affect the number of links we can see and capture. Google recommends a maximum of a few thousand links per page.

Forwarded or canonized

At Ahrefs, we trust all redirects and canonical tags, and consolidate the links where the websites tell us to. This is more complicated for Google because they have many canonical signals that determine which page is the leading one in a canonical cluster. We rely on pragmatism here, because it is impossible to know how Google sees the current situation and it would confuse our users if we treated canonicals (canonical tags) and redirects (redirects) differently every time.

These links are marked with “301”, “302” or “Canonical” in our reports - such as:

Which domains are indexed?

In Ahrefs there is the report "Referring Domains ” (“Referring Domains”), which displays all domains that link to a website or web page.

Referring Domains report in Ahrefs Site Explorer.

But how exactly do we count the domains?

You'd think this question would be easy to answer - it's just “domain.com”, right? In fact, the whole thing is a little more complicated as there are many ways to count domains. One way is to treat each registered domain as a domain - that seems to be the way Google handles it in the Google Search Console. Another option is to treat each subdomain as a different domain. You could also just aggregate some sections of a website and not others (like Google does), go through each section on a different tech stack, and so on. There are a variety of options.

At Ahrefs we have around 175 million domains after a corresponding review. This verification process includes removing spam domains and filtering out some subdomains that we have found that different users are controlling the different areas. We use our own list for this, but there is also a public list at https://publicsuffix.org/list/.

It is important to note that different domain definitions can result in large variations in the referring domains. Here are some examples of items that other providers - but not Ahrefs - may be able to count as separate domains:

  • Mobile subdomains (m.domain.com, mobile.domain.com, etc.)
  • Country / language subdomains(en.domain.com, fr.domain.com, de.domain.com, jp.domain.com, etc). There may be exceptions in our index, such as wikipedia.org, but this is not common practice.
  • Any subdomains (support.domain.com, images.domain.com, etc.)

Another decision that backlink tool vendors must make is whether to count some subfolders or subdirectories as different domains. For example, I think that most link indexes would count different blogs on popular platforms (e.g. user1.blogspot.com, user2.blogspot.com) as different domains because they are controlled by different users. But why not do the same for sites like medium.com/user1 or github.com/user1? We don't do this at Ahrefs right now, but there is a chance we will in the future as we assume different people will be in control of each of the subdirectories on a site.

The point here is that there are many ways of counting domains. This becomes clear when you look at the different numbers of companies counting websites on the Internet. According to Verisign, there were 370.7 million registered domains across all TLDs in the third quarter of 2020. According to Netcraft, as of November 2020 there were 1,229,948,224 websites on 263,787,870 individual domains with 193.8 million active pages. According to Internet Live Stats, there are approximately 1.8 billion websites with fewer than 200 million active pages. Obviously, every company uses a different methodology for counting domains.

To recap, what we do at Ahrefs is to collect all the websites we know of, remove a variety of spam and inactive domains, and then add some for subdomains on websites like blogspot.com. This brings us to our total number of around 175 million domains. Other indexes may handle this differently and come up with a different number.

Why we can't capture all of the links

Since we find backlinks by crawling the web, we can only do so on websites that we are allowed to crawl. If website owners block the AhrefsBot in their robots.txt file, we won't be able to crawl their website. For example, if you get a backlink from website.com and website.com blocks the AhrefsBot, we won't be able to crawl the page and your backlink won't appear in Ahrefs. IP blocking, user-agent blocking of servers (unlike robots.txt), server timeouts, bot protection and many other things can also affect the ability to crawl some websites. Indeed, it is not easy to crawl the web on a large scale.

We have several link indexes

Every tool has to make decisions about how to store and retrieve data. At Ahrefs, we split our data across multiple indexes.

  • live (current) - Links where we can see that they are still active on the web. This best represents the current state of the web and is what many of our users find most useful.
  • Recent (Recently) - Links we've seen on the web in the past 3-4 months.
  • Historical (total) - all the links we've ever seen. This is the most comprehensive list, but it also includes many links that no longer exist.

You can switch back and forth between the indices in our backlink and referring domain reports.

Other indexes may choose to show all of the data they have ever seen. While this means that they display a large number of links, it also means that many of these links may no longer exist.

Conclusion

With this blog article we want to give you users more information about our index and how it works so that you can make better and more informed decisions. We would also like to hear your opinion if you think we should change something and why.

If you are currently comparing link indices and have questions about our data or if anything is unclear, you are of course welcome to contact us.