What Is a Robots.txt File?
Robots.txt is a text file with instructions for search engine robots telling them which pages they should and shouldn't crawl.
Why Is Robots.txt Important?
A robots.txt file helps manage web crawler activities so they don’t overwork your website
few reasons to use a robots.txt file:
1. Optimize Crawl Budget-Crawl budget refers to the number of pages Google will crawl on your site within a given time frame. The number can vary based on your site’s size, health, and number of backlinks.
2- Block Duplicate and Non-Public Pages
3. Hide Resources
Sometimes you want to exclude resources such as PDFs, videos, and images from search results.
Robots.txt Syntax-
1-The User-Agent Directive-The first line of every directives block is the user-agent, which identifies the crawler.
User-agent: Googlebot
Disallow: /wp-admin/
2-The Disallow Robots.txt Directive
You can have multiple disallow directives that specify which parts of your site the crawler can’t access.
For example, if you wanted to allow all search engines to crawl your entire site, your block would look like this:
User-agent: *
Allow: /
If you wanted to block all search engines from crawling your site, your block would look like this:
User-agent: *
Disallow: /
3-The Allow Directive
The “Allow” directive allows search engines to crawl a subdirectory or specific page, even in an otherwise disallowed directory.
For example, if you want to prevent Googlebot from accessing every post on your blog except for one, your directive might look like this:
User-agent: Googlebot
Disallow: /blog
Allow: /blog/example-post
4-The Sitemap Directive-
The Sitemap directive tells search engines—specifically Bing, Yandex, and Google—where to find your XML sitemap.
5-Crawl-Delay Directive
The crawl-delay directive instructs crawlers to delay their crawl rates. To avoid overtaxing a server (i.e., slow down your website).
User-agent: *
Crawl-delay: 10
6-Noindex Directive
The robots.txt file tells a bot what it can or can’t crawl, but it can’t tell a search engine which URLs not to index and show in search results.The page will still show up in search results, but the bot won’t know what’s on it[meta description hidden]
Google never officially supported this directive, but on September 1, 2019, Google announced that this directive is not supported.
Comments
Post a Comment