Guide to XML Sitemap, Structure and Formats for Search Engines

What is a Sitemap?

Sitemap is a navigational guide for your website informing users and search engines about the structure of your complete site. It is a difficult task to keep track of the changes when a site grows in size with high number of pages. Sitemap helps to understand the complex link structure of a site and importance of any single page when compared to the entire site.

In this article we will explain the following topics on XML Sitemap:

  1. Types of Sitemap
  2. How to Create XML Sitemap?
  3. Structure of XML Sitemap
  4. Submitting Sitemap to search engines
  5. Different format of Sitemap for search engines

1. Types of Sitemap

Generally Sitemap is classified into the following two categories:

  • Sitemap for Users
  • Sitemap for Search Engines

1.1. Sitemap for Users

This is a simple HTML webpage part of your site which contains the link list of all pages in your site. The purpose of HTML Sitemap is to explain human users about the complete structure of your site in sections. This helps the user to understand what is the whole content of the site and locate the specific content easily.

Following are the important points to be remembered when using an HTML Sitemap in your site:

  • It is highly recommended to classify your content and display the links in relevant sections.
  • Avoid showing the Sitemap as simple URL list of your site which will not help the users for easy navigation.
  • Generate your own HTML Sitemap and avoid using free online HTML Sitemap generators which provide plain link list.
  • Update your Sitemap whenever there is an addition or deletion of pages.
  • Search engines provide importance to the sites having a visible HTML Sitemap.

Example of a simple HTML Sitemap is shown in the picture below:

HTML Sitemap for User Navigation
HTML Sitemap for User Navigation

1.2. Sitemap for Search Engines

One of the primary tasks of a site owner is to prepare a Sitemap and submit to various search engines for indexing. Sitemap for search engine generally uses XML format which serves the same purpose like HTML Sitemap with the difference that XML Sitemap helps the search engines to understand the site structure whereas HTML Sitemap is meant for the human users.

Search engine bots can easily crawl and index new and modified pages in your site with the help of XML Sitemap. You also can indicate the priority of a page to be considered for indexing compared to other pages in your site. XML Sitemap is not shown in the site navigation for the users but can be seen in the web browsers with .xml page. Below is an example of a XML Sitemap:

XML Sitemap for Search Engine Navigation
XML Sitemap for Search Engine Navigation

2. How to Create XML Sitemap?

Most of the hosting providers offer automatic Sitemap generation for the site which is really a good option. This will automatically get updated whenever there is change in the page content. If your hosting provider does not offer an auto Sitemap generating option then generate your own Sitemap using online XML Sitemap generator tools.

Related:  SEO for Site Navigation

You can also use plugins to generate dynamic Sitemap file which will get updated automatically when new URLs are added to your site. For example, popular SEO plugins like Yoast are freely available when you use WordPress as your content management system.

2.1. Where to Upload Sitemap File?

When you manually create a Sitemap, you should upload the file in the root directory of your site. In case if you want to upload the Sitemap file to any other directories on your server use robots.txt directive to inform search engines where exactly your Sitemap is located.

The problem of manual file upload is to update the file continuously whenever new URLs are added to your site. So ensure to replace the old Sitemap and upload the latest Sitemap whenever you publish new articles on your site.

3. XML Sitemap Structure

XML schema is the protocol used in XML Sitemaps which is one of the most commonly used Sitemaps for search engine submission. Creation of XML Sitemap should follow the below guidelines:

  • XML Sitemap should be encoded with UTF-8 format.
  • All URLs in the Sitemap should be from single domain, for example webnots.com or www.webnots.com.
  • Use only English language and avoid using other language characters in Sitemap file.
  • All content should be entity escaped meaning all special characters should be converted into an escape code as shown below:

Special Characters Escape Code
Ampersand (&)& amp ;
Single Quote (')& apos ;
Double Quote (")& quot ;
Greater Than (>)& gt ;
Less Than (<)& lt ;

3.1. XML Sitemap Example

Below is an example of a XML Sitemap with single URL. The content between tags <url>…</url> can be repeated for each additional link with that page URL inside <loc>…</loc> tags.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.yoursitename.com/</loc>
<lastmod>2014-04-20</lastmod>
<changefreq>Daily</changefreq>
<priority>0.6</priority>
</url>
</urlset>

You can also submit a RSS or Atom feed to search engines for your blog instead of a XML Sitemap.

3.2. XML Tags Explained

XML Sitemap contains tags where <urlset> and <url> tags are used a container tags to mention each URL with the use of <loc> tag. All mandatory and optional tags used in a XML Sitemap are explained in detail as below:

TagDescriptionRequired / Optional
< ? Xml version = "1.0" encoding = "utf-8"? >XML file to be encoded with UTF-8Optional
< urlset >Starting tag of the entire collection of links defined.Required
< url >Starting tag for an URL.Required
< loc > http://www.example.com/index.html < / loc >Specific a page link.Required
< lastmod > 2014-04-05T11:29:40-07:00 < / lastmod >Last modified time.Optional
< changefreq > Daily < /changefreq >Frequency of modification of the URL.Optional
< priority > 0.8 < /priority >Relative importance of an URL compare to other URLs of the site. Range between 0.0 to 1.0.Optional
< /url >Closing tag for an URL.Required
< /urlset >Closing tag of the entire collection of links defined.Required

3.2.1. Location:

Location indicates an URL of a page which should start with http:// and should not exceed 2048 characters. Some search engines like Baidu only accepts the maximum length of 256 bytes.

3.2.2. Last Modification:

This indicates the date of last modification of Sitemap file. The time stamp can be in any of the below formats:

FormatDescriptionExample
YYYY-MM-DDYear, month and date2014-04-10
YYYY-MMYear and month2014-04
YYYYYear2014
YYYY-MM-DDThh:mmTZDComplete date plus hours and minutes2014-04-10T23:10+03:30
YYYY-MM-DDThh:mm:ssTZDComplete date plus hours, minutes and seconds2014-04-10T09:10:40+05:00
YYYY-MM-DDThh:mm:ss.sTZDComplete date plus hours, minutes, seconds and a decimal fraction of a second2014-04-10T09:10:40.34+04:00
Related:  How to Fix 404 Not Found Error?

In the above table:

  • YYYY is a year
  • MM is a month in two digits (01=January)
  • DD is a day in two digits (from 01 to 31)
  • hh is hours (00 to 23 and no need to mention am/pm)
  • mm is minutes (00 to 59)
  • ss is seconds (00 to 59)
  • s refers a decimal fraction of a second
  • TZD is time zone designator (either Z or +hh:mm or -hh:mm). Indicated as Z for UTC (Coordinated Universal Time) format  and other time zones are shown with hh:mm behind or ahead of UTC. For example, 2014-04-20T22:10:40-05:00 corresponds to April 20, 2014, 22:10:40 pm, US Eastern Standard Time.
3.2.3. Change Frequency:

This indicates the frequency of change of that webpage. This is used by search engines to understand the change frequency of that page and the acceptable values are – Always, Never, Hourly, Daily, Weekly, Monthly and Yearly.

The attribute “always” indicates that the webpage is changed every time it is loaded and “never” indicates archived URLs.

3.2.4. Priority:

This indicates the priority of an URL relative to all other pages in your site. the value shall range from 0.0 to 1.0 and the default value is 0.5. Use this tag to indicate relatively important page to search engines.

4. Submitting Sitemap to Search Engines

It is an important requirement for webmasters to create and submit a Sitemap of a site to search engines like Google, Bing, Baidu and Yandex. Sitemaps can be submitted anonymously or through webmaster tools account based on the search engines. This helps the crawlers of the search engines to understand all the pages on a website and index accordingly. You can also use a Sitemap to provide additional information to search engines about your site like date of last update, change frequency and relative priority of pages.

Following are the requirements for submitting Sitemap to search engines:

  • You need to have a Webmaster Tools account in order to submit your Sitemap to search engines. Though some search engines like Bing offers anonymous Sitemap submission we recommend to use Webmaster Tools to track the search performance of your site.
  • You site must be verified in Webmaster Tools account before you can submit your Sitemap.
  • You can also submit RSS feed of your blog as a Sitemap. It is recommended to submit both RSS feed as well XML Sitemap in case if you own a blog.
  • Search engines will daily crawl your site based on the information from the Sitemap and look for new and modified content.
  • The errors and warnings in the Sitemap are shown in your Webmaster Tools account for further action.

Learn more about submitting Sitemap to search engines, you can also download the free guide.

Submitting Sitemap to search engines does not guarantee neither indexing of your pages nor high ranking in search results. This is merely a guide for search engine bots to understand your site structure.

After submitting Sitemap for the first time, it may take up to one day for the search engines to crawl the content of your Sitemap.

Ensure your Sitemap contains only English characters and numerals. Search engines may not accept Sitemaps with other language codes.

Related:  Beginners Guide to SEO for Magento Store

5. Formats of Sitemap Acceptable by Search Engines

Search engines accept one of the below Sitemap formats:

  • Text file containing single URL per line
  • XML Sitemap
  • Sitemap Index – containing more than one XML Sitemap or a Text file

XML Sitemap is the most commonly used format and generally accessible with the URL “yoursite.com/sitemap.xml”.

5.1. Text File Format

Text file with .txt extension file type is the simple way to create a Sitemap especially for the sites with fewer pages. You can create a text Sitemap on your own with the help of simple text editors like Notepad.
Below are the general guidelines while creating your text Sitemap:

  • Enter one URL per line.
  • URLs cannot contain line breaks or any other information.
  • You must write the full URL, including the http.
Sitemap Text File
Sitemap Text File

In addition to these general guidelines each search engines may enforce guidelines as below:

  • Maximum number of URLs per file should not exceed 50,000. If your site contains more than 50,000 URLs then separate the list into multiple text files.
  • Text file size should be less than 10MB (10,485,760 bytes).
  • Text file must use UTF-8 encoding; ensure to save your text file in UTF-8 format.
Saving file in UTF-8 format
Saving file in UTF-8 format

5.2. XML Format

XML Sitemaps are the broadly accepted Sitemap format due to ease of use and facility to provide additional information to search engine crawlers. The file format contains simple tags as explained in the below table:

TagDescriptionRequired / Optional
< ? Xml version = "1.0" encoding = "utf-8"? >XML file to be encoded with UTF-8Optional
< urlset >Starting tag of the entire collection of links defined.Required
< url >Starting tag for an URL.Required
< loc > http://www.example.com/index.html < / loc >Specific a page link.Required
< lastmod > 2014-04-05T11:29:40-07:00 < / lastmod >Last modified time.Optional
< changefreq > Daily < /changefreq >Frequency of modification of the URL.Optional
< priority > 0.8 < /priority >Relative importance of an URL compare to other URLs of the site. Range between 0.0 to 1.0.Optional
< /url >Closing tag for an URL.Required
< /urlset >Closing tag of the entire collection of links defined.Required

Click here to see a simple XML Sitemap of this site.

5.3. Sitemap Index Format

In order to submit a large Sitemap, the URLs can be listed in a Sitemap index file and then the index file can be submitted to search engines. The format of a Sitemap index file is as below:

TagDescriptionRequired / Optional
< ? Xml version = "1.0" encoding = "utf-8"? >XML file to be encoded with UTF-8Optional
< sitemapindex >Starting tag for Sitemap index.Required
< sitemap >Starting tag for Sitemap.Required
< loc > http://www.example.com/sitemap.xml < / loc >Specific link of a Sitemap page or Atom feed or RSS feed or text file.Required
< lastmod > 2014-04-05T11:29:40-07:00 < / lastmod >Last modified time.Optional
< changefreq > Daily < /changefreq >Frequency of modification of the URL.Optional
< /sitemap >Closing tag for Sitemap.Required
< /sitemapindex >Closing tag for Sitemap index.Required

Add multiple Sitemaps with the tags <sitemap> </ sitemap> if you have many Sitemaps. Click here to see the Sitemap index of this site.

RSS, mRSS or Atom 1.0 feed of a blog can also be submitted instead of a Sitemap. But the problem is feeds will contain only the latest URLs and not all, hence content modification on old URLs not part of the feed will not get crawled immediately by search engines.

Recommended Articles:

Leave a Comment