“Blocked by robots.txt” vs. “Indexed, though blocked by robots.txt”: Differences and How To Fix Them


“Indexed, though blocked by robots.txt” and “Blocked by robots.txt” are Google Search Console statuses. They point out that the affected pages didn’t get crawled as you blocked them throughout the robots.txt file.

However, the distinction between these two points is that:

  • With “Blocked by robots.txt,” your URLs received’t seem on Google,
  • In flip, with “Indexed, though blocked by robots.txt,” you possibly can see your affected URLs within the search outcomes even though they’re blocked with a Disallow directive in your robots.txt. In different phrases, “Indexed, though blocked by robots.txt” implies that Google didn’t crawl your URL however listed it nonetheless.

As fixing these points lies on the coronary heart of making a wholesome crawling and indexing technique in your web site, let’s analyze when and how you must handle them.

What does indexing must do with robots.txt?

Whereas the connection between robots.txt and the indexing course of could also be complicated, let me allow you to perceive the subject in depth. It’ll make greedy the ultimate resolution simpler.

How do discovery, crawling, and indexing work?

Earlier than a web page will get listed, search engine crawlers should first uncover and crawl it.

On the discovery stage, the crawler learns {that a} given URL exists. Whereas crawling, Googlebot visits that URL and collects details about its contents. Solely then does the URL go to the index and will be discovered amongst different search outcomes.

Psst. The method isn’t at all times that clean, however you possibly can learn to assist it by studying our articles on:

What’s robots.txt?

Robots.txt is a file that you need to use to manage how Googlebot crawls your web site. Everytime you put a Disallow directive in it, Googlebot is aware of it can’t go to pages to which this directive applies.

However robots.txt doesn’t management indexing. 

For detailed directions on modifying and managing the file, see our robots.txt information.

What causes “Indexed, though blocked by robots.txt” in Google Search Console?

Generally Google decides to index a found web page regardless of being unable to crawl it and perceive its content material.

How the usual indexing process looks like when you use the Disallow directive in robots.txt and when your page is indexed but blocked by robots.txt.

On this state of affairs, Google is normally motivated by many hyperlinks resulting in the web page blocked by robots.txt.

Hyperlinks translate into PageRank rating. Google calculates it to evaluate whether or not a given web page is essential. The PageRank algorithm takes under consideration each inside and exterior hyperlinks.

When there’s a multitude in your hyperlinks and Google sees {that a} disallowed web page has a excessive PageRank worth, it could suppose the web page is critical sufficient to put it within the index. 

However, the index will solely retailer a clean URL with no content material data as a result of the content material hasn’t been crawled.

Why is “Indexed, though blocked by robots.txt” dangerous for web optimization?

The “Indexed, though blocked by robots.txt” standing is a significant issue. It might appear comparatively benign, however it could sabotage your web optimization in two vital methods.

Poor search look

In the event you blocked a given web page by mistake, “Indexed, though blocked by robots.txt” doesn’t imply you bought fortunate, and Google corrected your error.

Pages that get listed with out crawling received’t look engaging when proven in search outcomes. Google received’t be capable of show:

  • Title tag (as a substitute, it’s going to routinely generate a title from the URL or data offered by pages that hyperlink to your web page),
  • Meta description,
  • Any extra data within the type of wealthy outcomes.

With out these components, customers received’t know what to anticipate after coming into the web page and could select competing web sites, drastically reducing your CTR.

Right here’s an instance – certainly one of Google’s personal merchandise:

No meta description in SERPs for the website of Google Jamboard.

Google Jamboard is blocked from crawling, however with almost 20000 hyperlinks from different web sites (in line with Ahrefs), Google nonetheless listed it. 

Whereas the web page ranks, it’s displayed with none extra data. That’s as a result of Google couldn’t crawl it and acquire any data to show. It solely reveals the URL and a fundamental title primarily based on what Google discovered on the opposite web sites that hyperlink to Jamboard. 

To see in case your web page has the identical downside and is “Indexed, though blocked by robots.txt,” go to your Google Search Console and verify it within the URL Inspection Tool.

How to navigate the information on the

Undesirable site visitors

In the event you deliberately used the robots.txt Disallow directive for a given web page, you don’t need customers to search out that web page on Google. Let’s say, for instance, you’re nonetheless engaged on that web page’s content material, and it’s not prepared for public view.

But when the web page will get listed, customers can discover it, enter it, and type a adverse opinion about your web site.

How to repair “Indexed, though blocked by robots.txt?”

Firstly, discover the “Indexed, though blocked by robots.txt” standing on the backside of the Web page Indexing report in your Google Search Console.

There you may even see the “Improve page appearance” desk.


After clicking on the standing, you will notice a listing of affected URLs and a chart exhibiting how their quantity has modified over time.

The chart on how the number of the

The record will be filtered by URL or URL path. When you will have a number of URLs affected by this downside, and you solely wish to have a look at some components of your web site, use the pyramid image on the proper facet.

The filtering function in Google Search Console enables you to sort all the affected pages by a URL.

Earlier than you begin troubleshooting, take into account if the URLs within the record actually needs to be listed. Do they include content material which may be of worth to your guests?

If you need the web page listed

If the web page was disallowed in robots.txt by mistake, you should modify the file.

After eradicating the Disallow directive blocking the crawling of your URL, Googlebot will seemingly crawl it the following time it visits your web site. 

If you need the web page deindexed

If the web page incorporates data you don’t wish to present customers visiting you by way of the search engine, it’s essential to point out to Google that you just don’t need the web page to be listed. 

Robots.txt shouldn’t be used to manage indexing. This file blocks Googlebot from crawling. As an alternative, use the noindex tag.

Google at all times respects noindex when it finds it on a web page. Utilizing it, you possibly can guarantee Google received’t present your web page within the search outcomes.

Yow will discover detailed directions on implementing it in your pages in our noindex tag information.

Keep in mind to let Google crawl your web page to find this HTML tag. It’s part of the web page’s content material.

In the event you add the noindex tag however hold the web page blocked in robots.txt, Google received’t uncover the tag. And the web page will stay “Indexed, though blocked by robots.txt.”

When Google crawls the web page and sees the noindex tag, it will likely be dropped from the index. Google Search Console will show one other indexing standing when inspecting that URL.

Needless to say if you wish to hold any web page away from Google and its customers, it’s at all times the most secure option to implement HTTP authentication in your server. That means, solely the customers who log in can entry it. It’s crucial if you wish to defend delicate information, for instance.

If you want a long-term resolution

The above options will allow you to treatment the “Indexed, though blocked by robots.txt” downside for some time. It’s potential, nevertheless, that it’ll seem in regard to different pages sooner or later.

Such standing signifies that your web site may have thorough inside linking or backlink audit enchancment. 

What does “Blocked by robots.txt” imply in Google Search Console?

“Blocked by robots.txt” signifies that Google didn’t crawl your URL since you blocked it with a Disallow directive in robots.txt. It additionally implies that the URL wasn’t listed.

Keep in mind that it’s regular to stop Googlebot from crawling some URLs, particularly as your web site will get larger. A few of them aren’t related for serps for varied causes.

The choice on what pages ought to and shouldn’t be crawled in your web site is a hard and fast step in making a sound indexing technique in your web site.

How to repair “Blocked by robots.txt?”

Firstly, head to the ‘Why pages aren’t listed’ desk beneath the chart within the Web page indexing report back to navigate your “Blocked by robots.txt points.


Addressing this concern requires a special method primarily based on whether or not you blocked your web page by mistake or on function.

Let me information you on the way to act in these two conditions:

If you used the Disallow directive by mistake

On this case, if you wish to repair “Blocked by robots.txt,” take away the Disallow directive blocking the crawling of a given web page.

Because of that, Googlebot will seemingly crawl your URL the following time it crawls your web site. With out additional points with that URL, Google can even index it.

You probably have many URLs affected by this concern, attempt filtering them out in GSC. Click on on the standing and navigate the inverted pyramid image above the URL record.

The filtering function in Google Search Console enables you to sort all the affected pages by a URL.

You possibly can filter all affected pages by URL (or solely a part of a URL path) and the final crawl date.

In the event you see “Blocked by robots.txt,” it could additionally point out that you’ve deliberately blocked a complete listing however unintentionally included a web page you wish to get crawled. To troubleshoot this:

  • Embody as many URL path fragments in your Disallow directive as you possibly can to keep away from potential errors, or
  • Use the Enable directive to permit bots to crawl a selected URL inside a disallowed listing.

When modifying your robots.txt, I recommend you validate your directives utilizing the robots.txt Tester in Google Search Console. The device downloads the robots.txt file in your web site and helps you verify in case your robots.txt file is appropriately blocking entry to given URLs.

The robots.txt Tester additionally lets you verify how your directives affect a selected URL on the area for a given Person-agent, e.g., Googlebot. Because of that, you possibly can experiment with making use of totally different directives and see if the URL is blocked or accepted.

The robots.txt Tester enables you to validate if you're correctly blocking access to a given URL for a, e.g., Googlebot User Agent.

Though, you should keep in mind that the device received’t routinely change your robots.txt file. Due to this fact, once you end testing the directives, you should implement all of the modifications manually to your file.

Moreover, I like to recommend utilizing the Robots Exclusion Checker extension in Google Chrome. When looking any area, the device enables you to uncover pages blocked by robots.txt. It really works in real-time, so it’s going to allow you to react rapidly to verify and work on the blocked URLs in your area.

Take a look at my Twitter thread to see how I take advantage of this device above.

What in the event you hold blocking your helpful pages in robots.txt? Chances are you’ll considerably hurt your visibility in search outcomes.

If you used the Disallow directive on function

You possibly can ignore the “Blocked by robots.txt” standing in Google Search Console so long as you aren’t disallowing any helpful URLs in your robots.txt file.

Keep in mind that blocking bots from crawling your low-quality or duplicate content material is completely regular.

And deciding which pages bots ought to and shouldn’t crawl is essential to:

  • Create a crawling technique in your web site, and
  • Considerably allow you to optimize and save your crawl finances.


Right here’s what you are able to do now:

  1. Contact us.
  2. Obtain a customized plan from us to cope with your points.
  3. Unlock your web site’s crawling and indexing potential!

Nonetheless uncertain of dropping us a line? Attain out for crawl finances optimization providers to enhance the crawling of your web site.

Key takeaways

  1. The Disallow directive within the robots.txt file blocks Google from crawling your web page however not indexing it.
  2. Having pages which can be each listed and uncrawled is dangerous in your web optimization.
  3. To repair “Indexed, though blocked by robots.txt,” you should resolve if affected pages needs to be seen on Search and then:
    • Modify your robots.txt file,
    • Use the noindex meta tag if crucial.
  4. Having the “Blocked by robots.txt” pages in your web site is regular in the event you don’t need bots to crawl these URLs and see their content material.

Getting your crawling and indexing proper is the inspiration of web optimization, and a well-organized robots.txt file is only one a part of it. Contact us for an intensive technical web optimization audit to navigate your points.

Leave a comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More