Robots.txt inclusion is a standard way of instructing search engine crawlers what content to crawl and what not in your site. You can either directly access robots.txt file and edit or control the search engine visibility of your site from the administrator section. Weebly is a free hosting platform and does not allow users to access files on the server. Hence, the only available option is to manage the settings from your site editor. If you wonder how to edit robots.txt file from Weebly site editor, here are the instructions for you.
What Can You Do with Robots.txt?
Search engines like Google and Bing use web crawlers or robots to find and index the content from your site. These robots will first check the file called robots.txt on the root directory of your site’s server and get the instructions to exclude any pages or sections from crawling. Weebly also includes XML Sitemap location in robots.txt file to help search engines easily find your Sitemap file. You can exclude the followings from your Weebly site in robots.txt file:
- Weebly by default exclude some bots and directories
- You can exclude single pages, blog or the entire site
Blocking pages through robots.txt file will instantly remove them from search engine. If you unblock the same pages, search engines need longer time to reindex them. You may loose the ranking and traffic due to this reason. Therefore, make sure to block correct pages and understand the risk before doing this task.
Default Weebly Robots.txt File
Whenever you click the “Publish” button , Weebly will automatically generate a robots.txt file for your site. This is a dynamic file which you can’t see in Weebly code editor where you can find source templates and other assets. However, you can view the file on your browser by adding robots.txt suffix to your site address. Below are examples of Weebly robots.txt file URL:
|Weebly free site
By default, Weebly robots.txt file contains the following entries which are inserted for all Weebly sites.
- XML Sitemap URL
- Disallowing NerdyBot from accessing all content on the site
- Blocking all robots from accessing /ajax/ and /apps/ directories on your site
Other than NerdyBot crawler and ajax/apps folders, all other content from your site are allowed for crawling indexing.
Note that user-agent is a string for identifying specific crawler. For example, Googlebot and Bingbot are the user-agents for Google and Bing search engines respectively. Using * as user-agent indicates the rules are applicable for all user-agents. The rule should mention disallowing or allowing all content (indicated by /), a specific folder or a specific page. You should add a separate line for each user-agent that you want to add instructions and combine all rules for that user-agent in a single block. However, Weebly adds disallow entry for user-agent as * and it is not possible to block pages or site for a specific user-agent. You can check our separate article to learn more about robots.txt file.
Now, that you know where is the robots.txt file and how it works. Let us explain how to add exclusion entries for specific pages and blog in your Weebly robots.txt file.
Excluding Entire Site in Robots.txt File
Follow the below instructions if you want the entire site to be exclude from search engines.
- Go to “Settings” section in Weebly site editor.
- Navigate to “SEO” settings and scroll down to the bottom.
- Find and enable “Hide site from search engines” option.
- Click “Save” button to save your changes.
- Make sure to click “Publish” button so that the changes will be applied on your live site.
Now, open your robots.txt file and check what happened in the file. Weebly would have deleted all default entries and disallow the entire site content for all user-agents.
Note that if you want to take your site offline without public access, then go to “General” section. Scroll down to the bottom and un-publish your site.
Disallowing Pages in Robots.txt File
Follow the below instructions when you do not want search engines to index a specific page from your site.
- Go to “Pages” section in Weebly site editor.
- Select the page that you want to hide by adding it in your robots.txt file.
- Click on “SEO Settings” button.
- Scroll down to the bottom of SEO settings panel and check “Hide this page from search engines” option.
- Publish your site for the changes to take effect.
Now, check your robots.txt file. You will see the page is disallowed with an exclusion rule and added to existing set of rules for all user-agents.
Before excluding individual pages or blog section, make sure you have disabled the “Hide site from search engines” option as explained above. It does not make sense to hide a page when your entire site is already hidden. Therefore, first make your site visible to search engines and then hide a single page or post.
Note that you can also hide a specific page by setting an access password or allowing member only access. You can do this using “Visibility” setting by selecting a page under “Pages” section.
Disallowing Blog Posts in Robots.txt File
Unfortunately, Weebly does not allow blocking a single blog post. What you can do is to disallow the entire blog page similar to hiding a standard page as explained above. However, this will be block entire “blog” directory and hence all blog posts in your site. This will have a big impact when you have larger number of published posts.
- When you are in Weebly site editor, go to your “Pages” section and click on the “Blog” page. make sure to select your blog page as the name could be anything that you have given when creating your Weebly blog.
- Go to “SEO Settings” section and select “Hide this page from search engines” option.
- Publish your site and check the robots.txt file in a separate browser window.
Unlike disallowing single page, you can see all blog related pages are blocked in the file.
Here is the detail of all blocked blog items in Weebly robots.txt file:
|Blocked Page / Section
|This is the blog page showing all posts. However, this page will be redirected to yoursite.com/blog page in Weebly.
|This is your blog feed URL.
|https://yoursite.com/blog/ https://yoursite.com/blog/first-post https://yoursite.com/blog/last-post
|This will disallow all blog posts which generally appears after /blog/ part in the URL. If you open yoursite.com/blog/ page, it will be redirected to yoursite.com/blog.
|This is the actual blog page you will see in browser when opening blog page.
When you have multiple blogs on your site, you will see corresponding entries for all blog pages in robots.txt file. The feed URL will change to /2/ for second blog and so on.
Limitations of Editing Weebly Robots.txt File
Though it is easy to block entire site and single pages in Weebly, you will have lot of restrictions.
- By default, all user-agents will be blocked and there are no options to block only a specific bot.
- You can’t block individual blog posts and blocking the entire blog will be a bad idea for this purpose.
- Weebly store pages like your product and category pages can’t be blocked with robots.txt file. However, Weebly blocks all uploaded store files and content from accessing them publicly.
Disallowing a page in robots.txt file will also remove that page from your XML Sitemap. However, search engines still can find the blocked page if it is linked from other pages. The best example is using Weebly search box which will show the result including your blocked pages.