A sitemap is a list of pages on a domain.

This stems from the days where having one page that linked to all pages was the easiest way to be sure all of your public-facing pages on your site got indexed by crawlers.

It still has some value in this regard, though only to make sure things get indexed at all, because after since pagerank became a think it had no effect on search results.

The specification is a formalization of the idea of "yeah just make HTML with lots of links", also allowing you to give more specific information of how to crawl, which can have some minor positive side effects e.g. on your resource use.


Sitemaps allow specification of

  • what parts are available for harvesting
  • when a page was last updated
  • how often each item will change
  • the relative, on the site (see note below)

Sitemaps are useful when

  • Things are not well linked yet, from the site itself and/or from elsewhere
  • you want to hint to search engines that, say, your news page and some dynamic content updates quite often, while some stuff is almost static
  • You are using Javascript drop-down menus, AJAXed content, in a way that means crawlers won't find your links/content.

They have little added value when all your content is well-harvested already.

Sitemaps could be said to complement robots.txt in that those can only ask not to harvest something.

