Google doesn't index my blog posts

Why is my page not on Google?

Have you invested a lot of work in your content, have you optimized the site, developed a good logical structure and got a lot of relevant links for your content? Your site should actually have good positions on Google & Co., but unfortunately you can identify rankings via your own search or via relevant SEO tools such as the Google Search Console, Searchmetrics or Sistrix. In the following article I have put together reasons and tools for you to check the reasons why your pages cannot be found on Google.

Scenarios why your page does not appear on Google

There are actually two possibilities why your page cannot be found on Google

  • New url: The URL is new and unknown to Google
  • Old URL: The URL is old and unknown to Google

So what's really going wrong now? In the following you will find some scenarios why your page was not processed by Google and therefore you cannot find it in the Google index. For you, I am now mainly referring to the scenario that a single document, i.e. a URL, cannot be found on Google.

  • Missing link: The document (URL) is not linked or not sufficiently linked (internal or external)
  • Crawling: Your url is new and has not yet been crawled by Google
  • Indexing: Your URL has been crawled by Google but you have suppressed indexing.
  • Robots.txt: You have not allowed the Google Crawler to access your URL
  • Duplicate content: You have "Duplicate Content" and Google only chose one URL
  • Canonization: The content is available, but you have made an incorrect canonicalization (rel = "canonical")

How do you analyze whether Google knows your URL?

You can use simple on-board tools to analyze whether Google knows your site. To do this, you copy your URL and then enter your URL into the Google search in combination with the "site: deineneueurl" command. Here's an example of what that might look like.

# 1 Google hasn't crawled and indexed your page yet

Google also takes time. So if you have recently put your site live, it can happen that Google simply has not yet found and indexed your site. Typically, the Google Crawler follows known URLs and follows all new URLs that Google can find in this crawl process. The last known number (2016) is that Google has more than 130 trillion pages in its index.

 

And even if Google has one of the largest computing capacities in the world, Google prioritizes its crawling and indexing. What options do you have to make Google aware of a new URL on your site:

  1. You link your new URL internally from important pages
  2. You are trying to link your new URL from an external (trustworthy) site
  3. You submit your new URL via an XML sitemap
  4. You use the URL inspection tool in the Google Search Console

Here's how to use the URL inspection tool: As described, there are a number of ways to make Google aware of a new URL. One possibility is the URL inspection tool in the new Google Search Console. You go to the Search Console and enter the URL at the top of the search window that you would like Google to crawl and index. In a first step, Google analyzes the URL and gives you feedback on the URL. In the screenshot below you can see what that might look like.

If you have a completely new URL, Google will give you the following feedback.

URL is not on google: This page is not in the index, but not because of an error. Cover: URL is unknown to Google. In the next step, click on "Apply for indexing" on the right-hand side. This process takes about 1-2 minutes and you will then receive a message that your URL has been added to a preferred crawling queue.

Your canonical tag points to a different URL

The canonical tag (rel = canonical) can be used to ensure that only one URL (source) is used for indexing when there are several URLs with the same or similar content. This avoids the disadvantages of duplicate content. If, for example, the canonical refers to a different URL after a relaunch, this means that the original URL cannot be correctly indexed. Here, too, the path leads you to the Google Search Console., You use the URL inspection tool and check the URL that you would like to find in the Google index. For this example I have taken a URL that we deliberately do not want to have in Google's index. You see here, too, Google says "URL is not known" again, but the reason is that Google is assuming a different URL than the correct URL (canonical URL provided by the user).

 

If you indicated this on purpose via the so-called rel = “canonical” Google, then everything is fine. Perhaps you accidentally entered rel = “canonical” in your content management system. Then it can happen to you that Google crawls your site but does not index it. What are you doing now? You now go to your content management system and check in the settings for your URL whether you have set a rel = “canonical” on another page. Most of the time, these settings are made possible via WordPress plugins. If you didn't want that, remove the rel = “canonical” and give the URL to Google for crawling again.

 

You locked out Google via noindex

The meta tag “Robots” with the value “noindex” may prevent Google from indexing your site. If you want to change that, remove the value "noindex". Most of the time, these settings are made via your content management systems.Note: The noindex command is in the head area of ​​your website and should look something like this:


Browseo is a simple tool to identify the wrong markup via a "noindex". In the following we have shown a screenshot for you of how this could look. In addition to a lot of relevant data that will help you with search engine optimization, such as title, description, etc. you will also find below the information whether your page is set to "index" or "noindex".

You have generally locked out crawlers via robots.txt

If there is a file named robots.txt in the root directory of your domain, which contains the following lines, you have blocked both Google's crawlers and all other crawlers from viewing your site:

If you remove these two lines, your page can be crawled again and thus also indexed. You can read out the rure robots.txt file quite easily. Since this has to be freely accessible, you only have to append /robots.txt behind your domain or start page and you can use it to check which prohibitions are possibly stored here. Here you can see what that could look like. https://www.121watt.de/robots.txt and here you can also take a look at our robots.txt. Usually, for example, you close pages like

  • Pages with duplicate content
  • Pagination Pages
  • Dynamic product and service pages
  • Admin pages
  • Shopping cart pages
  • Chat sites
  • Thank you pages out.

You can find out more about robots.txt and which pages you should exclude here and which ones shouldn't be found here.


121-hour online marketing newsletter:
Never miss any more SEO and online marketing news!

  • The best articles are clearly summarized every week from 500 sources.
  • We report regularly on the most important industry events and current seminars.
  • More than 10,000 online marketers are already following the industry with us.

You just locked out Google's crawlers via robots.txt

With the robots.txt it is also possible to specifically lock out crawlers. In this case, the entry is, for example:

If you remove this entry, you can happily continue to index.

Your pagination is not set correctly on the SEO side

If a subpage of your project cannot be reached, this may be because the pagination is assigned nofollow.

Individual links point to nofollow

It is also possible that individual links cannot be reached via nofollow. You can check this by searching the head of your page for the following sequence:

You have too much duplicate content

If you provide content that can be found exactly like this or very similar on another page, Google will index this page, but probably not display it where you would like it to be. The reason: this content is not unique. This happens particularly often with product descriptions in online shops or facet navigation.

tip: You can easily find out if you are providing duplicate content by taking a sentence from the text in question and searching in quotes on Google or checking your URL on Copyscape for DC.

The URL was removed using the Google Search Console

Perhaps you intentionally or unintentionally removed a URL from the Google Index using the Google Search Console. In the end, someone requested that your page be removed from the index. To find out, log into the Google Search Console and click on Index -> Remove. If no URL appears there, this is in any case not the reason why your page cannot be found. Here's an example of how to find a remote URL in the Google Search Console.

 

You have become a victim of malware or hijacking

This is usually pointed out to you in the Search Console (webmaster tools). So always take a look at the notifications in the top right or check out the security notifications in the navigation on the left. Alternatively, I recommend the Anti-Malware SaaS Sucuri - this service constantly monitors your website and informs you of problems and helps you solve them!

Is your site still not found?

Of course, it is possible that none of these errors caused poor indexing. However, the causes mentioned above cover the most common sources of error. In any case, you shouldn't panic if your project suddenly no longer appears in the SERPs. Have a cup of coffee and go through this article step by step. I am sure you will find the solution - if not, visit me in one of my Technical SEO seminars or ask me on Twitter & Co. ;-)

Carry out regular SEO audits!

Do mistakes happen in search engine optimization? But of course, that's why I've gotten used to doing an SEO audit on my site in regular routines

  • Check Google Search Console. Report coverage - daily
  • Important page, such as product pages once a week about the screaming frog
  • All pages - once a month in the ScreamingFrog
  • How do impressions develop. Clicks, CTR and position - once a week in the GSC

Here you can see a screenshot of my weekly SEO audit in the ScreamingFrog. I especially look at the HTTP status codes, the indexing status and the indexability. In the event of problems with redirects, incorrectly set NoIndex instructions, exclusion by robots.txt, rel = “canonical” etc. I then proceed as shown in this article.

How helpful is this article to you?

Oh! Unfortunately you are not really satisfied with the contribution. Please write to us what did not suit you. You're helping us a lot!

Oh! Unfortunately you are not really satisfied with the contribution. Please write to us what did not suit you. You're helping us a lot!

  • not helpful at all

  • less helpful

  • rather helpful

  • very helpful

  • I was looking for another topic

Other articles that might interest you: