Robots.txt is a simple text file that instructs web crawlers what are the content allowed to be crawled and indexed for public. The file should be uploaded in the root directory of your website (generally “/public_html/”). And the search engines will look your site’s root directory for the presence of robots.txt file. Refer our separate article on robots.txt to learn complete details of robots.txt file.
Below are the values you can provide in the tool to generate Robots.txt entries:
Choose from the dropdown that you want to allow or block all robots from crawling your site. Good bots like Google and Bing follow this directive set in robots.txt file, but bad bots do not follow this. You should find such bad robots by looking into your server’s log and block them using .htaccess directive.
Crawl delay is the time in seconds between crawling of search robots which is used to control the aggressive bots from slowing down your server.
Most of the shared hosting companies need crawl delay of at least 10 seconds in order to protect the server from aggressive bots. If you have managed, VPS or dedicated server then choose the value as “Default – No Delay. Remember, choosing the value “20 Seconds” will allow all the crawlers (that follow this directive) to index 4320 pages per day. This should not be a problem for smaller sites while bigger site owners can leave this field as default.
Similar to robots.txt, by default Sitemap.xml file also should be available in root directory of your site. Search engines will look for the XML Sitemap in the root directory and crawl the content accordingly. If your Sitemap is located in site’s root then leave this field blank.
If you have the Sitemap located in different directory other than root, then enter the complete XML Sitemap URL to inform search engine crawlers where your Sitemap file is located.
Regardless of the location of XML Sitemap, ensure to submit the Sitemap in webmaster tools account of Google, Bing, Yandex and Baidu. Here is a search engines Sitemap submission guide for your reference.
You can leave this value same as the field 1 by selecting “Same as Default”. Or else select the allow or disallow value for individual search engine robots. This directive will be appended to the default control and will be followed only by that search egnines. For example, you can select allow for default value and disallow only for Baidu spider. This will allow all search bots except Baidu.
We have given the options for most popular search engines like Google, Bing / MSN, Yahoo!, Baidu, Yandex, Ask/Teoma and Alexa/Wayback. You can refer the complete user agent list and choose additional bots for blocking.
If you want to restrict specific directories then enter the directory name with the trailing slash. For example, if you want to disallow “yoursite.com/admin/” directory then enter “/admin/” in this field. The tool will allow you to add up to six directories but you can add more directly in the robots.file before uploading to server.
Once you selected the required values, click on the “Create Robots.txt” button to generate the Robots.txt file entries in the text box.
If you made any mistake or wanted to reset the tool to initial values then click on the “Clear” button. This will remove all generated entires from the text box.
The generated entries for your robots.txt file can be copied from this text box.
Below are some of the examples of robots.txt entries created using this tool:
Search Robots: Baidu - Disallow
| User-agent: baiduspider
|Default: Disallow|| User-agent: *
Crawl Delay: 10 seconds
Search Robots: Google Image - Disallow
Restrcit Directory: /admin/
| User-agent: googlebot-image
First copy all the entries generated by the robots.txt generator tool. Open a text editor like Notepad / TextEdit and paste the content. Save the file in “UTF-8” format with the name “robots.txt”.
Upload the “robots.txt” file in to your site’s root directory using FTP or File Manager option available in the control panel of your hosting account.
Once you have uploaded the robots.txt file, it should be accessed through the web browser like normal webpage. Open your favorite browser and enter the URL like “yoursite.com/robots.txt” and you should see the file displayed like below.