Google will send email to website owner whenever there are page indexing issues with the sites submitted in Google Search Console account. Robots.txt related issues are the most common among variety of page indexing issues. Indexed, though blocked by robots.txt is one such issue you will receive when there are problems with your robots.txt entries. Do not worry if you have received an email from Google on this issue and here is how to check and fix the issue on your site.
Indexed Though Blocked by Robots.txt
The notification from Google on this issue will look like below. As you can see, it is categorized as “Top non-critical issues” which means that the page with problem is available in Google search. Google says the indexing notification is only for the purpose of improvement. However, this is a confusing point as blocking with robots.txt will remove the page from search results instantly.
About Robots.txt Blocking
Before explaining how to fix the issue, it is important to understand what is robots.txt file. It is a simple text file on the root of your server which instructs search engine bots about how to crawl your site. For example, you have a page like https://mysite.com/personal-page.html which you do not want Google to show in search results. In this case, your robots.txt file should look like below.
User-agent: * Disallow: https://mysite.com/personal-page.html
The problem occurs because the listed pages are either linked from other sites (external links) or linked from other pages on your own site (internal links). Due to this linking, Google can crawl the pages though they are blocked in your robots.txt file.
In addition to using robots.txt file at sever level, you can also block single pages using meta robots tag in the header section like below.
<head> <meta name="robots" content="noindex"> </head>
However, you will see “Excluded by ‘noindex’ tag” issue in Google Search Console in this case.
Checking Page URLs Blocked with Robots.txt
Click on the “Fix Page Indexing Issues” button in the notification email you received to view the exact details of the pages affected by this issue. Alternatively, go to “Pages” section under “Indexing” heading in your Google Search Console account to filter out the issue and check affected pages.
Now that you know the affected page URL(s), it’s time to check your robots.txt file and fix the issue if they are mistakenly blocked. Remember, you do not need to take any action in case if you intentionally blocked the page.
1. Use Google robots.txt Tester Tool to Check Your Robots.txt File
Google offers a hidden robots.txt tester tool in Google Search Console account (like disavow link tool). Make sure you are logged into your search Console account and go the robots.txt tester tool.
Select your domain property from the dropdown and the tool will show your robots.txt file’s content in the box. If you see the page URL with indexing issue is listed in your robots.txt file, then you should remove it to fix the issue.
Remember, the tool does not show the latest version and you can see the date on which the robots.txt file was indexed in Google. If you do not see the page URL in your robots.txt file, it could be due to the file’s content shown is not the latest.
2. Check Individual Page URL for Blocking
It will be difficult to understand the robots.txt file content when the rules used are generic. In such case, you can enter the blocked page URL in the box shown at the bottom of the tool and click “Test” button. You will see the button changes to “Blocked” and the rule is highlighted in the file’s content if it is relevant.
3. Check Live Robots.txt File
If the file date in the tool is outdated, click on the “See live robots.txt” link. This will open your live robots.txt file in new browser tab. Alternatively, you can open the URL yoursite.com/robots.txt in browser to see the latest robots.txt file’s content.
4. Check with URL Inspection Tool
When you are checking the page indexing issues, hover over the URL and click the search lens icon that says, “Inspect URL”. Now, Google will check the page’s content and show whether it is in Google Search or not.
Sometimes the result may show different error than the original error like “Excluded by ‘noindex’ tag”. The thing you should check in the result is that whether the affected page is indexed in Google search or not.
Fixing and Resubmitting the URLs
If you do not see the page is showing in live robots.txt file, make sure to clear your site’s caching. For example, if you are using Cloudflare CDN, go to your account and purge caching of your robots.txt file URL. This will help to delete the cached robots.txt file and you can test the live file again to see whether the URLs are there.
When the robots.txt file contains blocked page URLs, follow the below steps:
- Go to your hosting account and open File Manager app.
- Navigate to the root location of your site and find the robots.txt file.
- Edit the file and remove the page URLs.
- Save and upload the file back on the server.
- Clear the server or CDN cache and check the robots.txt file in your browser and make sure the URLs are removed.
After that, login to your Google Search Console account and go to the “Indexed, though blocked by robots.txt” issue under “Pages” section. Click “Validate Fix” button and confirm the action. Google will check the pages and notify you in email when the issue is fixed. This may take couple of days to week depending upon the number of URLs affected.
If you have one or two affected pages and want to index them quickly, then paste the URL in the search box showing on the top. Press enter key for Google to inspect your URL and show the results. Click “Request Reindexing” link to resubmit the page for indexing.
Consideration for WordPress
If you did not modify the robots.txt file, then you will be wondering how the file got updated with blocked page URLs. Content management systems like WordPress has plugins offering option to edit robots.txt file from the admin panel. It is possible that you or one of the administrators might have mistakenly added the page URL for blocking. In this case, you should remove the blocked URLs in robots.txt file from your admin panel instead of editing from File Manager. Otherwise, the entries will be regenerated by the plugin after you removed it from File Manager.