{"id":5142,"date":"2021-05-24T11:09:04","date_gmt":"2021-05-24T09:09:04","guid":{"rendered":"https:\/\/help.marketingminer.com\/?post_type=kb&#038;p=5142"},"modified":"2021-08-06T15:18:01","modified_gmt":"2021-08-06T13:18:01","slug":"robots-txt-what-is-it-and-how-does-it-work","status":"publish","type":"kb","link":"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/","title":{"rendered":"Robots.txt: what is it and how does it work"},"content":{"rendered":"\n<p><strong>Robots.txt<\/strong> is a plain text file that tells web robots (most often search engine crawlers) which web pages or files can or cannot be crawled. It\u2019s good practice to also include your sitemap location in the robots.txt file.&nbsp;<\/p>\n\n\n\n<p>Here\u2019s the example of the MM robots.txt file:\u00a0<a href=\"https:\/\/www.marketingminer.com\/robots.txt\">https:\/\/www.marketingminer.com\/robots.txt<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"513\" height=\"528\" src=\"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/Ukazka-robots.txt-webu-marketing-miner-2.png\" alt=\"Marketing Miner robots.txt file example\" class=\"wp-image-5143\" srcset=\"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/Ukazka-robots.txt-webu-marketing-miner-2.png 513w, https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/Ukazka-robots.txt-webu-marketing-miner-2-291x300.png 291w, https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/Ukazka-robots.txt-webu-marketing-miner-2-150x154.png 150w\" sizes=\"(max-width: 513px) 100vw, 513px\" \/><\/figure><\/div>\n\n\n\n<h2>What is robots.txt used for?&nbsp;<\/h2>\n\n\n\n<p>By using robots.txt file, you prevent crawlers from accessing specific pages\/files. It keeps your resources under control and it prevents overloading the server with bots. In addition, it also helps to optimize a <a href=\"https:\/\/help.marketingminer.com\/en\/article\/what-is-a-crawl-budget\/\">crawl budget<\/a> by allowing the crawling of important pages.<\/p>\n\n\n<div class=\"mkb-shortcode-container\">        <div class=\"mkb-tip\">\n            <div class=\"mkb-tip__icon\">\n                <i class=\"fa fa-lg fa-lightbulb-o\"><\/i>\n            <\/div>\n            <div class=\"mkb-tip__content\">\n                <strong>TIP<\/strong>: You want to pay attention to crawl budget optimization especially when you have a big site with 100k+ landing pages.\u00a0            <\/div>\n        <\/div>\n        <\/div>\n\n\n<p>Many people use robots.txt to prevent search engine bots from indexing pages (e.g. admin pages, pages with sensitive information,&#8230;). Although, Google advises using a <strong>noindex<\/strong> directive for pages you don\u2019t want to index in the search engine.&nbsp;<\/p>\n\n\n\n<p>Remember, if you use the noindex tag on your site, it doesn\u2019t help you save your crawl budget as the crawler still needs to visit the page to be able to discover the noindex directive.&nbsp;<\/p>\n\n\n\n<h2>How does robots.txt work?&nbsp;<\/h2>\n\n\n\n<p>The primary goal of search engine crawlers is to crawl web pages to discover content, index it, and then serve it to users in the search engines.&nbsp;&nbsp;<\/p>\n\n\n\n<p>When the crawl attempts to visit a page, it automatically looks for your robots.txt file located in the <strong>root of the website <\/strong>(for example: <a href=\"https:\/\/www.marketingminer.com\/robots.txt\">https:\/\/www.marketingminer.com\/robots.txt<\/a>). If your robots.txt doesn\u2019t exist or it isn\u2019t located at the root of your website folder, the crawler will <strong>automatically check all your web pages<\/strong>.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Unfortunately the robots.txt directives are not supported by all crawlers (e.g. email address scrapers etc.). For this reason, it\u2019s important to remember that some search engines ignore them completely.<\/p>\n\n\n\n<h2>Example robots.txt:<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: user-agent-name\nDisallow: pages you don't want to be crawled\n<\/code><\/pre>\n\n\n\n<p>In the example below, you can see that <strong>Googlebot<\/strong> (user-agent name) is restricted from accessing all pages that contain <strong>\/blog\/<\/strong> string in the URL.&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: Googlebot \nDisallow: \/blog\/\n<\/code><\/pre>\n\n\n\n<p>Below, you can find the most common format of a robots.txt file. Most CMS programs already have a robots.txt file in place. It means that, all user agents (represented by an <strong>asterisk<\/strong> *) can crawl all pages (there is nothing blocked in the disallow section).&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: *\nDisallow:\n<\/code><\/pre>\n\n\n\n<p>Be careful with the command below, as this directive blocks all bots from crawling your site (even your homepage!).&nbsp;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>User-agent: *\nDisallow: \/\n<\/code><\/pre>\n\n\n\n<p>It often happens that web developers forget to remove the slash from the disallow section in robots.txt file after putting the site live. And then many website owners are wondering why their site is not being indexed yet.&nbsp;<\/p>\n\n\n\n<h2>Robots.txt syntax<\/h2>\n\n\n\n<p>In the simple form, robots.txt file looks as follows:<\/p>\n\n\n\n<ul><li><strong>User-agent:<\/strong> The first line of every block of rules is the user-agent, which refers to the web crawler for which the directive has been written for. See the<a href=\"http:\/\/www.robotstxt.org\/db.html\"> robots database<\/a> for the most common user agents.&nbsp;<\/li><li><strong>Disallow: <\/strong>The second line specifies which parts of the website the designated user-agent can\u2019t access. Only add <strong>relative paths<\/strong>, not absolute ones.&nbsp;<\/li><li><strong>Allow: <\/strong>At the moment, this directive is only accepted by <strong>Googlebot<\/strong>. It overrides disallow directives in the same robots.txt file. In this instance, crawlers are authorized to access specific pages even when their path was blocked by a disallow directive.&nbsp;<\/li><li><strong>Crawl-delay:<\/strong> Previously, you could use this directive to slow down crawling in order to not overload the server of your website. A crawl delay was specified in seconds. Unfortunately, <a href=\"https:\/\/developers.google.com\/search\/blog\/2019\/07\/a-note-on-unsupported-rules-in-robotstxt\">Google stopped supporting<\/a> this directive in 2019.<\/li><li><strong>Sitemap: <\/strong>As we previously mentioned, it\u2019s good practice to include your sitemap location to the robots.txt file so crawlers can discover it as soon as possible.&nbsp;<\/li><\/ul>\n\n\n<div class=\"mkb-shortcode-container\">        <div class=\"mkb-tip\">\n            <div class=\"mkb-tip__icon\">\n                <i class=\"fa fa-lg fa-lightbulb-o\"><\/i>\n            <\/div>\n            <div class=\"mkb-tip__content\">\n                <strong>TIP:<\/strong> Every crawler can have a different syntax and it\u2019s very important to verify if a specific search engine crawler understands all your directives.\u00a0            <\/div>\n        <\/div>\n        <\/div>\n\n\n<h2>How to validate your robots.txt file?<\/h2>\n\n\n\n<p>When your robots.txt file is ready and needs to be validated to ensure that search engines are crawling the right pages, we recommend you to use a testing tool created directly by Google.&nbsp;<\/p>\n\n\n\n<h3>The robots.txt Tester tool in Google Search Console<\/h3>\n\n\n\n<p>To be able to test your robots.txt file with the robots.txt tester, you need to add your website and verify the property in <a href=\"https:\/\/search.google.com\/search-console\/welcome\">Google Search Console<\/a>. You can find the robots.txt Tester tool here: <a href=\"https:\/\/www.google.com\/webmasters\/tools\/robots-testing-tool\">https:\/\/www.google.com\/webmasters\/tools\/robots-testing-tool<\/a><\/p>\n\n\n\n<p>When you select what <strong>domain<\/strong> you want to check, you should be able to view the robots.txt file and review it for all errors or warnings. This is the latest version of your robots.txt file that Googlebot currently reads. Below the robots.txt, you can also test specific pages to see if they are blocked or not. It looks something like this:&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"407\" src=\"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-testing-tool-in-google-search-console-1024x407.png\" alt=\"Robots.txt testing tool in Google Search Console\" class=\"wp-image-5146\" srcset=\"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-testing-tool-in-google-search-console-1024x407.png 1024w, https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-testing-tool-in-google-search-console-300x119.png 300w, https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-testing-tool-in-google-search-console-768x305.png 768w, https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-testing-tool-in-google-search-console-150x60.png 150w, https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-testing-tool-in-google-search-console.png 1525w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3>URL Indexability miner&nbsp;<\/h3>\n\n\n\n<p>You can also check all blocked pages<strong> in bulk<\/strong> through robots.txt in Marketing Miner. We created a <a href=\"https:\/\/help.marketingminer.com\/en\/article\/url-indexability\/\">URL indexability miner<\/a> to validate whether your pages have been indexed by search engines or not.&nbsp;<\/p>\n\n\n\n<h2>Conclusion<\/h2>\n\n\n\n<p>Find out more information about robots.txt files here: <a href=\"https:\/\/developers.google.com\/search\/docs\/advanced\/robots\/intro\">https:\/\/developers.google.com\/search\/docs\/advanced\/robots\/intro<\/a>.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Robots.txt is a plain text file that tells web robots (most often search engine crawlers) which web pages or files can or cannot be crawled. It&rsquo;s good practice to also include your sitemap location in the robots.txt file.&nbsp; Here&rsquo;s the example of the MM robots.txt file:&nbsp;https:\/\/www.marketingminer.com\/robots.txt. What is robots.txt used for?&nbsp; By using robots.txt file, &#8230; <a title=\"Robots.txt: what is it and how does it work\" class=\"read-more\" href=\"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/\" aria-label=\"More on Robots.txt: what is it and how does it work\">Read more<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","template":"","meta":[],"kbtopic":[36],"kbtag":[87,84],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Robots.txt: what is it and how does it work - Marketing Miner Knowledge Base<\/title>\n<meta name=\"description\" content=\"Robots.txt is a plain text file that tells web robots (most often search engine crawlers) which web pages or files can or cannot be crawled. It\u2019s good\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Robots.txt: what is it and how does it work - Marketing Miner Knowledge Base\" \/>\n<meta property=\"og:description\" content=\"Robots.txt is a plain text file that tells web robots (most often search engine crawlers) which web pages or files can or cannot be crawled. It\u2019s good\" \/>\n<meta property=\"og:url\" content=\"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/\" \/>\n<meta property=\"og:site_name\" content=\"Marketing Miner Knowledge Base\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/marketingminer\/\" \/>\n<meta property=\"article:modified_time\" content=\"2021-08-06T13:18:01+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-txt-what-is-it-and-how-does-it-work.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1354\" \/>\n\t<meta property=\"og:image:height\" content=\"761\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-txt-what-is-it-and-how-does-it-work.png\" \/>\n<meta name=\"twitter:site\" content=\"@marketingminer\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/\",\"url\":\"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/\",\"name\":\"Robots.txt: what is it and how does it work - Marketing Miner Knowledge Base\",\"isPartOf\":{\"@id\":\"https:\/\/help.marketingminer.com\/en\/#website\"},\"datePublished\":\"2021-05-24T09:09:04+00:00\",\"dateModified\":\"2021-08-06T13:18:01+00:00\",\"description\":\"Robots.txt is a plain text file that tells web robots (most often search engine crawlers) which web pages or files can or cannot be crawled. It\u2019s good\",\"breadcrumb\":{\"@id\":\"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/help.marketingminer.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Robots.txt: what is it and how does it work\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/help.marketingminer.com\/en\/#website\",\"url\":\"https:\/\/help.marketingminer.com\/en\/\",\"name\":\"Marketing Miner Knowledge Base\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/help.marketingminer.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/help.marketingminer.com\/en\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/help.marketingminer.com\/en\/#organization\",\"name\":\"Marketing Miner\",\"url\":\"https:\/\/help.marketingminer.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/help.marketingminer.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2020\/06\/cropped-Logo-01@3x.png\",\"contentUrl\":\"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2020\/06\/cropped-Logo-01@3x.png\",\"width\":4098,\"height\":819,\"caption\":\"Marketing Miner\"},\"image\":{\"@id\":\"https:\/\/help.marketingminer.com\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/marketingminer\/\",\"https:\/\/twitter.com\/marketingminer\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Robots.txt: what is it and how does it work - Marketing Miner Knowledge Base","description":"Robots.txt is a plain text file that tells web robots (most often search engine crawlers) which web pages or files can or cannot be crawled. It\u2019s good","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/","og_locale":"en_US","og_type":"article","og_title":"Robots.txt: what is it and how does it work - Marketing Miner Knowledge Base","og_description":"Robots.txt is a plain text file that tells web robots (most often search engine crawlers) which web pages or files can or cannot be crawled. It\u2019s good","og_url":"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/","og_site_name":"Marketing Miner Knowledge Base","article_publisher":"https:\/\/www.facebook.com\/marketingminer\/","article_modified_time":"2021-08-06T13:18:01+00:00","og_image":[{"width":1354,"height":761,"url":"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-txt-what-is-it-and-how-does-it-work.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_image":"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2021\/05\/robots-txt-what-is-it-and-how-does-it-work.png","twitter_site":"@marketingminer","twitter_misc":{"Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/","url":"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/","name":"Robots.txt: what is it and how does it work - Marketing Miner Knowledge Base","isPartOf":{"@id":"https:\/\/help.marketingminer.com\/en\/#website"},"datePublished":"2021-05-24T09:09:04+00:00","dateModified":"2021-08-06T13:18:01+00:00","description":"Robots.txt is a plain text file that tells web robots (most often search engine crawlers) which web pages or files can or cannot be crawled. It\u2019s good","breadcrumb":{"@id":"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/help.marketingminer.com\/en\/article\/robots-txt-what-is-it-and-how-does-it-work\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/help.marketingminer.com\/en\/"},{"@type":"ListItem","position":2,"name":"Robots.txt: what is it and how does it work"}]},{"@type":"WebSite","@id":"https:\/\/help.marketingminer.com\/en\/#website","url":"https:\/\/help.marketingminer.com\/en\/","name":"Marketing Miner Knowledge Base","description":"","publisher":{"@id":"https:\/\/help.marketingminer.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/help.marketingminer.com\/en\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/help.marketingminer.com\/en\/#organization","name":"Marketing Miner","url":"https:\/\/help.marketingminer.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/help.marketingminer.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2020\/06\/cropped-Logo-01@3x.png","contentUrl":"https:\/\/help.marketingminer.com\/wp-content\/uploads\/2020\/06\/cropped-Logo-01@3x.png","width":4098,"height":819,"caption":"Marketing Miner"},"image":{"@id":"https:\/\/help.marketingminer.com\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/marketingminer\/","https:\/\/twitter.com\/marketingminer"]}]}},"_links":{"self":[{"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/kb\/5142"}],"collection":[{"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/kb"}],"about":[{"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/types\/kb"}],"author":[{"embeddable":true,"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/comments?post=5142"}],"version-history":[{"count":2,"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/kb\/5142\/revisions"}],"predecessor-version":[{"id":5609,"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/kb\/5142\/revisions\/5609"}],"wp:attachment":[{"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/media?parent=5142"}],"wp:term":[{"taxonomy":"kbtopic","embeddable":true,"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/kbtopic?post=5142"},{"taxonomy":"kbtag","embeddable":true,"href":"https:\/\/help.marketingminer.com\/en\/wp-json\/wp\/v2\/kbtag?post=5142"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}