The primary objective of a webmaster is to make sure all the pages are indexed by search engines. However, there are many reasons a website owner needs to hide pages from search engine. The best example is to protect the internal or confidential pages from public view. Though membership and authentication are the best options to hide confidential pages, you can also use the Robots.txt file to instruct search engines. This file tells all search engines which pages in your site are to be indexed and which are not. You can restrict all popular search engines like Google, Bing and Baidu using a single file.
Generating Robots.txt File from Baidu Webmaster Tools
You can generate robots.txt file using any text file or free robots.txt file generator tools available online. However, it is also easy to generate the file right from the Baidu webmaster tools account.
Baidu Webmaster Tools and Robots.txt File
Baidu webmaster tools offer two options under Robots section.
- One is to check whether your site has any existing robots.txt file and display it in details.
- Other is to create a free robots.txt file for your site.
The Robots option is available under Web Analytics section of Baidu webmaster tools account as shown in the below picture. You should have already added and verified your website in the account in order to use the tool. Alternatively, you can access this option here without registering for webmaster tools account.
Detect and Display Robots.txt
Select the “Detect Robots.txt” tab, enter your site URL and click on Detect button. You can see the detailed analysis of the robots.txt file if detected in your site.
You need to have a webmaster tools account in order to update the robots.txt file in your site. Note Baidu webmaster tools can only detect up to 48k file size, hence ensure your robots.txt file is not too large.
Generate Customized Robots.txt File
Select the “Generate robots.txt” tab or click on the “Generate” button in the previous step to create your own robots.txt file for your site. Enter the following details and click on Create button:
- User Agent – You can choose Baidu Spider, All and Other option. If you choose other option then you need to enter the name of the user agent.
- Status – This indicates whether you wanted the pages to be crawled or not crawled.
- Path – Here you need to mention the directory path for the search engines to crawl. This should start with / and merely mentioning / indicates the root directory of your site. Baidu supports the directory length of up to 255 characters, hence ensure your directory is not too long.
Robots.txt file will be generated and shown in the box which you can upload in your site’s directory for controlling the search engines crawling your pages.
Uploading Robots.txt File on Your Site
Copy the content and save it as a plain text file in UTF-8 format with the name as robots.txt. You have to upload this file under the root installation of your site. You can either use FTP or use File Manager app to upload the file in the server. Make sure that you can access the file in browsers using the URL like yoursite.com/robots.txt. If there are any pages or directories to block on your site, it is also a good idea to submit an empty robots.txt file.