What Is a Robots.txt File?

Robots.txt is a text file with instructions for search engine robots telling them which pages they should and shouldn't crawl. 

Why Is Robots.txt Important?

A robots.txt file helps manage web crawler activities so they don’t overwork your website 

few reasons to use a robots.txt file:

1. Optimize Crawl Budget-Crawl budget refers to the number of pages Google will crawl on your site within a given time frame. The number can vary based on your site’s size, health, and number of backlinks. 

2- Block Duplicate and Non-Public Pages

3. Hide Resources

Sometimes you want to exclude resources such as PDFs, videos, and images from search results. 

Robots.txt Syntax-

1-The User-Agent Directive-The first line of every directives block is the user-agent, which identifies the crawler. 

User-agent: Googlebot

Disallow: /wp-admin/

2-The Disallow Robots.txt Directive

You can have multiple disallow directives that specify which parts of your site the crawler can’t access. 

For example, if you wanted to allow all search engines to crawl your entire site, your block would look like this:

User-agent: *

Allow: /

If you wanted to block all search engines from crawling your site, your block would look like this:

User-agent: *

Disallow: /

3-The Allow Directive

The “Allow” directive allows search engines to crawl a subdirectory or specific page, even in an otherwise disallowed directory.

For example, if you want to prevent Googlebot from accessing every post on your blog except for one, your directive might look like this:

User-agent: Googlebot

Disallow: /blog

Allow: /blog/example-post

4-The Sitemap Directive-

The Sitemap directive tells search engines—specifically Bing, Yandex, and Google—where to find your XML sitemap.

5-Crawl-Delay Directive

The crawl-delay directive instructs crawlers to delay their crawl rates. To avoid overtaxing a server (i.e., slow down your website).

User-agent: *

Crawl-delay: 10

6-Noindex Directive

The robots.txt file tells a bot what it can or can’t crawl, but it can’t tell a search engine which URLs not to index and show in search results.The page will still show up in search results, but the bot won’t know what’s on it[meta description hidden]

Google never officially supported this directive, but on September 1, 2019, Google announced that this directive is not supported. 


Comments

Popular posts from this blog

What Is JavaScript?

Images SEO

Sitemap Overview