Google uses Googlebot crawler for crawling and indexing webpages at least once in a day. Generally, the crawling is based on the XML Sitemap you submit in Google Search Console. However, crawling frequency can change and be faster for news websites compared to normal content websites. Similarly, Bing also crawls the pages using Bingbot crawler. One hand webmasters want Google and Bing to instantly index their pages, on other hand there are situations you have to block these crawlers to stop crawling entire site or certain pages on your site. In this article, we will explain how to block Googlebot and Bingbot, what will happen when you block the crawlers and common reasons for crawling issues.
Blocking Googlebot and Bingbot
There are multiple ways to block your pages from Google and Bing depending upon the severity you need.
1. Blocking with Robots.txt
Most popular and common way to block crawlers is to use a directive in your robots.txt file. For example, inserting the following lines will block Google and Bing from accessing a page on your site.
User-agent: Googlebot
Disallow: /your-page-url
User-agent: Bingbot
Disallow: /your-page-url
Though Google and Bing follows the robots.txt file, it does not work if the blocked pages are linked from another indexed article. It can be from your website or from an external site which you can’t control.
2. Using .htaccess to Block
Though uncommon, some people prefer to use .htaccess directive to block the crawlers. It is similar to blocking the IP address of Googlebot and Bingbot so that the complete access will be blocked for the mentioned pages or directories.
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/your-page-url
RewriteRule ^(.*)$ - [F,L]
3. Blocking Googlebot and Bingbot IP Addresses
The problems with above methods are that you need to have server access for editing files. In addition, you may also make mistake while editing robots.txt and .htaccess files. The alternate and effective option is to use block Googlebot and Bingbot IP addresses. Google and Bing provide their updated IP addresses for crawlers which you can use for blocking purposes. These IP addresses are in JSON format from which you need to extract the IP range and use. Remember, these are for Googlebot and Bingbot search crawlers and not for other purposes like AdSense crawler or Microsoft advertising crawler.
Using Hosting Panel
If you have server access, you can make use of the IP blocking tool available in your hosting panel. For example, HostGator offers a “IP Blocker” app called IP Deny Manager in their cPanel under “Security” section.
You can find similar tool all cPanel hosting companies like Bluehost. Click on the IP Blocker app and provide the IP range of Googlebot or Bingbot to block the access. For example, you can use one of the following methods to provide Googlebot IP address:
- Use CIDR format as given in the JSON file like 66.249.64.0/27.
- Implied IP range like 66.249.66.0-255
- Wildcard range like 66.249.*.*
- Simply enter googlebot.com as most of the Goolgebot user agents are from this host name.
In general, blocking one or few IP addresses is sufficient to block the access. However, you can use wildcard or host name to block the entire access.
Using Security Plugins for WordPress
Otherwise, if you are using content management systems like WordPress, you have lot of security plugins to block bots and IP addresses from the site’s administrator panel without going to hosting account. For example, SiteGround Security plugin allows you monitor live traffic to your site. You can find Googlebot and Bingbot IP address based on user agent name and block with few clicks right from your admin panel.
These are effective ways especially when you want to block Google and Bing accessing your entire site.
4. Hiding Pages with Authorization
This is useful for restricting search engines access to pages by setting permissions. For example, banking and membership sites hide the personalized content behind login authorization so that search engines can’t access the content. Based on the confidentiality of the content, you may need to apply firewall, blocking user profiles, etc. It is strongly recommended to hire a developer and setup the restrictions properly at required directory level so that Google will not crawl the prohibited section.
Controlling Crawl Rate or Crawl Frequency
If you find Googlebot and Bingbot are consuming high server resources, you can control the crawling rate or crawling frequency. Crawl rate is the number of requests per second made by Googlebot or Bingbot to fetch content from your site. For high traffic websites, controlling the crawling rate of bots is crucial to adjust the server resources. Learn more on how to change the crawl rate for Bingbot in Bing Webmaster Tools.
However, Google automatically use the optimized crawling rate for grabbing content from your site. You can view this from Google Search Console account. If you are not happy with the current crawl frequency, raise a special request to Google. The new crawl rate will work for next 90 days and will be reset back to optimized settings after that period. Learn more on why you should control Googlebot crawl rate.
What Happens When Blocking Googlebot and Bingbot?
When you block a page or site URL, you will see different types of errors in Google Search Console and Bing Webmaster Tools respectively. Here are some of the common errors you will notice in Search Console account:
- URL blocked by Robots.txt when you use robots.txt directives.
- Soft 404 with a message like “Submitted URL seems to be a soft 404″.
- Partially crawled or page has no content error.
If someone managing your website wrongly blocked pages on your site, you can check Google Search Console errors under “Coverage” section and fix them. However, you may not find issues when blocking IP or using .htaccess method. The easy way is to use URL Inspection tool in Google Search Console, Google PageSpeed Insights or mobile-friendly testing tool to test the live page can be crawled. You will see error and empty page rendered when Googlebot is blocked for accessing that page.
Final Words
You can use one of the above methods to block Googlebot and Bingbot from crawling your site. However, make sure to avoid mistakes while blocking specific page or section of your site. Especially, blocking IP address is the most dangerous action which will remove your pages from Google Search completely. You may need to resubmit the pages and wait for reindexing which may result in drop in traffic and hence revenue. Therefore, if you are not sure of how to block Googlebot and Bingbot, get in touch with your hosting company. Alternatively, hire a developer for custom development work like hiding confidential content behind authorization.
Leave a Reply
Your email is safe with us.