What is a sitemap and how to create one

Last modified: 06.08.2021
Estimated reading time: 6 min
Tags:

A sitemap is an XML file that provides information about all the important pages, files, images, or videos on your website. It provides search engines with an overview of all available content that should be discovered, crawled, and indexed. 

It assists crawlers to understand what’s on your website. It also helps to find pages that are not linked internally on the site. 

It’s good practice to add your sitemap location to the robots.txt file. Sitemap example: 

Sitemap: https://www.marketingminer.com/sitemap.xml

How does an XML sitemap look like? 

Here’s what a typical XML sitemap looks like: 

Here’s what a typical XML sitemap looks like

Many CMS create and manage sitemaps automatically and they can look a bit different. However, their purpose is always the same. 

YoastSEO XML sitemap

In the example above, you can see an automatically generated XML sitemap by YoastSEO, a WordPress plugin. Remember, it’s not important how the sitemap looks like, it’s all about the functionality. 

Sitemap index

All sitemaps are limited to a maximum of 50,000 URLs. If you exceed the limit, you will have to create a new one. If you break your list into multiple sitemaps, you can optionally create a sitemap index. A sitemap index is an XML file (same as a sitemap) that contains links to a number of sitemaps files. 

Let’s take a look at an XML sitemap example to analyze its parts in more detail:

<?xml version="1.0" encoding="UTF-8"?>
  <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
      <loc>https://www.marketingminer.com/sitemap1.xml</loc>
    </sitemap>
    <sitemap>
      <loc>https://www.marketingminer.com/sitemap2.xml.gz</loc>
    </sitemap>
  </sitemapindex>

URL set

Every sitemap needs to have a <urlset> tag that describes which version of the XML sitemap protocol standard is used. You will often see version 0.9 which is supported by most search engines. 

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

URL

Now we get to the most important part, the <url> tag. Every URL definition can contain the following tags:  

  • <loc> – contains an absolute URL. It should reference the canonical URL of the page you want to index. This is the required property for all <url> tags. 
  • <lastmod> – references the time at which the content on that URL was last updated. The date is in W3C date-time format (yyyy-mm-dd)
  • <priority> – specifies the priority of the URL, relative to all other links from the sitemap on a scale between 0.0 to 1.0. A higher number is more important. 
  • <changefreq> – represents how frequently content on the page is likely to change. This tag tells crawlers how often they should recrawl the page. Valid values: always, hourly, daily, weekly, monthly, yearly, never.

Example:

<url>
     <loc>https://www.marketingminer.com/en</loc>
     <lastmod>2020-10-08T13:32:20+00:00</lastmod>
     <priority>1.00</priority>
     <changefreq>monthly</changefreq>
</url>

Other sitemaps

Sitemaps don’t only include website URLs, you can also create custom sitemaps for your media content as well as news sitemaps. 

  • Video sitemap – contains video information. 
  • Image sitemap – provides information about images on your website. 
  • Google News sitemap – this sitemap is useful especially for news sites when it’s important for Google to discover news articles as quickly as possible. To achieve this, your website needs to be accepted into Google News first. 

Video sitemap

Video sitemap is a great way how to inform the crawler about your videos that are hosted on your own server to understand what the content is about. We recommend only adding new video content while it is still fresh. 

 Here’s what a video sitemap with all required parameters looks like: 

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
   <url>
     <loc>https://www.marketingminer.com/video/sitemap.html</loc>
     <video:video>                       <video:thumbnail_loc>https://www.marketingminer.com.com/thumbs/sitemap.jpg</video:thumbnail_loc>
       <video:title>XML sitemap file example</video:title>
       <video:description>What sitemap.xml is and how to create it step by step
         time</video:description>
       <video:content_loc>
          https://youtube.com/sitemap_video.mp4</video:content_loc>
       <video:player_loc>
         https://www.example.com/videoplayer.php?sitemap_video=123</video:player_loc>
      </video:video>
   </url>
</urlset>
TIP: Find out more information about video sitemaps (with optional tags included) in Google’s documentation: https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps.

Image sitemap

Image sitemaps are vital for being found by search engine crawlers. They help Google discover images that wouldn’t normally be found. For example, images that your site reaches with JavaScript code. 

Here’s an example of image sitemap: 

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://www.marketingminer.com.com/image_sitemap.html</loc>
    <image:image>
      <image:loc>https://www.marketingminer.com.com/sitemap.jpg</image:loc>
    </image:image>
   </url>
</urlset>

Together with an alt tag, image sitemaps provide crawlers with additional information about the images on the website. 

TIP: Find out more about image sitemaps and best practices here: https://developers.google.com/search/docs/advanced/sitemaps/image-sitemaps

Google News sitemap

If you want to increase the chances of your content showing in Google News, you should consider creating news sitemap that was created for this very reason. 

Here’s the example of Google news sitemap: 

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
  <url>
   <loc>https://www.marketingminer.com/en/blog/wordpress-plugin-v-1-0-launched.html</loc>
   <news:news>
   <news:publication>
     <news:name>Marketing Miner for WordPress has launched</news:name>
     <news:language>en</news:language>
   </news:publication>
   <news:publication_date>2021-04-20</news:publication_date>
     <news:title>Marketing Miner for WordPress has launched</news:title>
    </news:news>
  </url>
</urlset>

Google news sitemaps are a bit different as they can’t contain any articles that were published more than two days ago. The older URLs will be automatically removed from the news sitemap to only store the fresh content. 

TIP: Learn more about Google News sitemaps here: https://developers.google.com/search/docs/advanced/sitemaps/news-sitemap.

Do you need a sitemap.xml file? 

Google finds new pages by crawling links it discovers on pages. But if your website is new without rich history, backlinks, and a great internal linking structure, it’s really hard for search engine crawlers to discover your content. 

In this case, you need to implement a sitemap on your website to list all the important pages that you want search engine crawlers to find and index so users can see them in the search results. 

Here’s when you should consider having a sitemap.xml file on your website: 

  • New websites – As we mentioned above, if you have a new website, make sure to create an XML sitemap and submit it to Google Search Console. Thanks to this action, Google will easily discover your content and index it. 
  • Large websites – If you have a website with lots of landing pages and poor internal linking structure, we recommend adding new landing pages to your sitemap dynamically to make sure the search engine crawler won’t miss them. This is mainly the case for large eCommerce sites. 
  • News websites – Websites that regularly produce lots of timely and topical articles (typically news sites) should have sitemaps too. They improve how quickly Google crawlers find your newsworthy content to index and show it in Google News. For news sites, we talk about Google News Sitemap that we already mentioned above. 
  • Rich media content – As we discussed before if your media content is difficult to access (for example, images that your site reaches with JavaScript code), we recommend using different types of sitemaps (video, image…). 

Learn about sitemaps

  • Submitting a sitemap doesn’t automatically guarantee the link referenced in it will be indexed and shown in the search results. 
  • Each sitemap can include a maximum of 50,000 URLs and it must be no larger than 50 Mb. For large websites, we recommend creating a sitemap index file that contains links to all your sitemaps (in the image below, you can see how it looks like for WordPress). 
  • You should only include indexable, canonical URLs to your sitemap. Make sure to use full absolute URLs, and not relative URLs. 
  • Google recommends putting a sitemap file in the root directory of your website and naming it sitemap.xml. 
  • Your sitemap index file must be UTF-8 encoded. 
  • It’s good practice to add your sitemap.xml to the robots.txt file. 

How to create sitemaps 

When creating a sitemap, it’s important to remember that you have to build it manually when not using a CMS that can create dynamically generated sitemaps. Let’s take a look at both ways in more detail below. 

Manually creating a sitemap 

If you don’t use a CMS with an automatically generated sitemap, you can create it manually (we recommend checking yourdomainname.com/sitemap.xml first to make sure your sitemap doesn’t already exist). 

In this instance, Screaming Frog tool will be your best bet to generate a sitemap for free if your website doesn’t have more than 500 URLs. If your site is large, you will have to consider paid options. 

We also recommend using the XML Sitemap Generator tool that crawls all your important web pages and automatically creates sitemaps: https://www.xml-sitemaps.com/

TIP: Here’s a list of recommended web sitemap generators by Google: https://code.google.com/archive/p/sitemap-generators/wikis/SitemapGenerators.wiki.  

When your sitemap.xml file is generated, make sure to download it and upload it to the root of the website. 

TIP: A search engine crawler doesn’t have to necessarily find all important web pages and for this reason, we recommend looking at your generated sitemap.xml file first to ensure that all important pages, that you want to be indexed, are included. 

WordPress

Most CMS such as WordPress, Prestashop, Joomla, Wix, or Shopify have plugins that generate dynamic sitemaps.xml files already. 

In this guide, we look at the most widely used CMS platform, WordPress, and how you can create sitemap.xml files by using Yoast SEO plugin. 

At first, you need to go to this page to download Yoast SEO plugin: https://wordpress.org/plugins/wordpress-seo/

After you download and install the plugin, go back to WordPress and in the left menu navigate to SEO > General > Features and in XML sitemaps, select ON:

WordPress YoastSEO sitemap XML ON

Now, the Yoast SEO plugin can automatically generate a sitemap.xml file that will be available at yourdomainname.com/sitemap.xml or yourdomainname.com/sitemap_index.xml for the sitemap index. 

How to add a sitemap to Google Search Console

When your sitemap.xml file is ready and uploaded to your website, you can submit it to Google to crawl it as soon as possible. There are more ways how to let Google know about your sitemaps. 

The quickest way to notify Google is to ping them specifically with the location of the sitemap: http://www.google.com/ping?sitemap=yourdomainname.com/sitemap.xml

After you submit your sitemap, you should see this message as confirmation that it has been received. 

Google sitemap ping

As you can see above, Google recommends adding your sitemap to Google Search Console to monitor its status of the submit or crawl and possibly other issues. 

Sign in to your Google Search Console account and navigate to Sitemaps. In this section, you can add your sitemap’s URL (it’s mostly sitemap.xml or sitemap_index.xml). 

Add a new sitemap to Google Search Console

And that’s it! After you submit your sitemap files, you should see the Status column. This is where you can find out if the sitemap was loaded and processed successfully or with errors.  

Sitemap status processing in Google Search Console

After clicking on the See Index Coverage button or navigating to the Coverage section in the left menu, you can view detailed information about specific URLs that were crawled. 

It looks something like this: 

Index coverage in GSC

If you have any troubles with crawling and indexing your site, we recommend looking at the Sitemap errors section on this page to find out more details: https://support.google.com/webmasters/answer/7451001#errors&zippy=%2Ccomplete-error-list 

Learn more about what sitemaps and Google Search Console here: 

Conclusion

If you use a CMS, like WordPress, it is relatively quick and easy to generate your sitemap by using their plugins. 

If not, you can create your sitemap manually or choose third-party tools to create it. These tools will crawl all your URLs and create a new sitemap that you will have to edit first to make sure it doesn’t include pages you don’t want to show up in the search results. 

Don’t forget to submit your sitemap to Google to ensure that Googlebot will find and crawl it as soon as possible. 

Was this article helpful?
Dislike

Continue reading

Next: Robots.txt: what is it and how does it work
Have questions? Search our knowledgebase.