The ultimate guide to robots.txt

googlelink · 发表于 2023-11-2 09:27:30

The robots.txt file is one of the main ways of telling a search engine where it can and can't go on your website. All major search engines support its basic functionality, but some respond to additional rules, which can be helpful too. This guide covers all the ways to use robots.txt on your website.
Warning!

Any mistakes you make in your robots.txt can seriously harm your site, so read and understand this article before diving in.
TABle of contents

Whatisa robots.txt file?
What does the robots.txt filedo?
Where should I put my robots.txt file?
Pros and cons of using robots.txt
Robots.txt syntax
Don't block CSS and JS files in robots.txt
Test and fix in Google Search Console
Validate your robots.txt
See the code

Whatisa robots.txt file?

Crawl directivesThe robots.txt file is one of a number of crawl directives. We have guides on all of them and you'll find them here.

A robots.txt file is a plain text document located in a website's root directory, serving as a set of instructions to search engine bots. Also called the Robots Exclusion Protocol, the robots.txt file results from a consensus among early search engine developers. It's not an official standard set by any standards organization, although all major search engines adhere to it.
Robots.txt specifies which pages or sections should be crawled and indexed and which should be ignored. This file helps website owners control the behavior of search engine crawlers, allowing them to manage access, limit indexing to specific areas, and regulate crawling rate. While it's a public document, compliance with its directives is voluntary, but it is a powerful tool for guiding search engine bots and influencing the indexing process.
A basic robots.txt file might look something like this:
User-Agent: *
Disallow:

Sitemap: https://www.example.com/sitemap_index.xml
What does the robots.txt filedo?

CachingSearch engines typically cache the contents of the robots.txt so that they don't need to keep downloading it, but will usually refresh it several times a day. That means that changes to instructions are typically reflected fairly quickly.

Search engines discover and index the web by crawling pages. As they crawl, they discover and follow links. This takes them from site A to site B to site C, and so on. But before a search engine visits anypage on a domain it hasn't encountered, it will open that domain's robots.txt file. That lets them know which URLs on that site they're allowed to visit (and which ones they're not).

		自动登录	找回密码
密码			立即注册

The ultimate guide to robots.txt

浏览过的版块

站长推荐 /1