519:{{cite web|title=Important: Spiders, Robots and Web Wanderers |first=Martijn |last=Koster |work=www-talk mailing list |date=25 February 1994 |url=http://inkdroid.org/tmp/www-talk/4113.html |format=] archived message |url-status=dead |archive-url=https://web.archive.org/web/20131029200350/http://inkdroid.org/tmp/www-talk/4113.html |archive-date=October 29, 2013 }}</ref> on the ''www-talk'' mailing list, the main communication channel for WWW-related activities at the time. ] claims to have provoked Koster to suggest robots.txt, after he wrote a badly behaved web crawler that inadvertently caused a ] on Koster's server.<ref>{{cite web |url=http://www.antipope.org/charlie/blog-static/2009/06/how_i_got_here_in_the_end_part_3.html |title=How I got here in the end, part five: "things can only get better!" |work=Charlie's Diary |date=19 June 2006 |access-date=19 April 2014 |archive-url=https://web.archive.org/web/20131125220913/http://www.antipope.org/charlie/blog-static/2009/06/how_i_got_here_in_the_end_part_3.html |archive-date=2013-11-25 |url-status=live }}</ref>
497:|url-status=live }}</ref><ref>{{cite web |title=Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web |first=Roy |last=Fielding |work=First International Conference on the World Wide Web |year=1994 |place=Geneva |url=http://www94.web.cern.ch/WWW94/PapersWWW94/fielding.ps |access-date=September 25, 2013 |format=PostScript |archive-url=https://web.archive.org/web/20130927093658/http://www94.web.cern.ch/WWW94/PapersWWW94/fielding.ps |archive-date=2013-09-27 |url-status=live }}</ref> when working for ]<ref name=":0">{{cite web |url=http://www.robotstxt.org/orig.html#status |title=The Web Robots Pages |publisher=Robotstxt.org |date=1994-06-30 |access-date=2013-12-29 |archive-url=https://web.archive.org/web/20140112090633/http://www.robotstxt.org/orig.html#status |archive-date=2014-01-12 |url-status=live }}</ref> in February 1994<ref>
505:|url-status=live }}</ref><ref>{{cite web |title=Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web |first=Roy |last=Fielding |work=First International Conference on the World Wide Web |year=1994 |place=Geneva |url=http://www94.web.cern.ch/WWW94/PapersWWW94/fielding.ps |access-date=September 25, 2013 |format=PostScript |archive-url=https://web.archive.org/web/20130927093658/http://www94.web.cern.ch/WWW94/PapersWWW94/fielding.ps |archive-date=2013-09-27 |url-status=live }}</ref> when working for ]<ref name=":0">{{cite web |url=http://www.robotstxt.org/orig.html#status |title=The Web Robots Pages |publisher=Robotstxt.org |date=1994-06-30 |access-date=2013-12-29 |archive-url=https://web.archive.org/web/20140112090633/http://www.robotstxt.org/orig.html#status |archive-date=2014-01-12 |url-status=live }}</ref> in February 1994<
508:|archive-url=https://web.archive.org/web/20131029200350/http://inkdroid.org/tmp/www-talk/4113.html |archive-date=October 29, 2013 }}</ref> on the ''www-talk'' mailing list, the main communication channel for WWW-related activities at the time. ] claims to have provoked Koster to suggest robots.txt, after he wrote a badly behaved web crawler that inadvertently caused a ] on Koster's server.<ref>{{cite web |url=http://www.antipope.org/charlie/blog-static/2009/06/how_i_got_here_in_the_end_part_3.html |title=How I got here in the end, part five: "things can only get better!" |work=Charlie's Diary |date=19 June 2006 |access-date=19 April 2014 |archive-url=https://web.archive.org/web/20131125220913/http://www.antipope.org/charlie/blog-static/2009/06/how_i_got_here_in_the_end_part_3.html |archive-date=2013-11-25 |url-status=live }}</
551:
name="Verge">{{cite web|url=https://www.theverge.com/24067997/robots-txt-ai-text-file-web-crawlers-spiders|title=The text file that runs the internet|work=]|last=Pierce|first=David|date=14 February 2024|accessdate=16 March 2024}}</ref> most complied, including those operated by search engines such as ], ], and ].<ref name="sear_Robo">{{cite web |title=Robots.txt
Celebrates 20 Years Of Blocking Search Engines |author=Barry Schwartz |work=Search Engine Land |date=30 June 2014 |access-date=2015-11-19 |url=http://searchengineland.com/robots-txt-celebrates-20-years-blocking-search-engines-195479 |archive-url=https://web.archive.org/web/20150907000430/http://searchengineland.com/robots-txt-celebrates-20-years-blocking-search-engines-195479 |archive-date=2015-09-07 |url-status=live }}</ref>
543:
name="Verge">{{cite web|url=https://www.theverge.com/24067997/robots-txt-ai-text-file-web-crawlers-spiders|title=The text file that runs the internet|work=]|last=Pierce|first=David|date=14 February 2024|accessdate=16 March 2024}}</ref> most complied, including those operated by search engines such as ], ], and ].<ref name="sear_Robo">{{cite web |title=Robots.txt
Celebrates 20 Years Of Blocking Search Engines |author=Barry Schwartz |work=Search Engine Land |date=30 June 2014 |access-date=2015-11-19 |url=http://searchengineland.com/robots-txt-celebrates-20-years-blocking-search-engines-195479 |archive-url=https://web.archive.org/web/20150907000430/http://searchengineland.com/robots-txt-celebrates-20-years-blocking-search-engines-195479 |archive-date=2015-09-07 |url-status=live }}</ref>
2118:
1271:* {{Cite web |last=Allyn |first=Bobby |date=5 July 2024 |title=Artificial Intelligence Web Crawlers Are Running Amok |url=https://www.npr.org/2024/07/05/nx-s1-5026932/artificial-intelligence-web-crawlers-are-running-amok |url-status=live |archive-url=https://web.archive.org/web/20240706020252/https://www.npr.org/2024/07/05/nx-s1-5026932/artificial-intelligence-web-crawlers-are-running-amok |archive-date=6 July 2024 |work=] |publisher=] |access-date=6 July 2024}}
85:
1469:
768:
Specification {{!}} Documentation |url=https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt |access-date=2022-10-17 |website=Google
Developers |language=en |archive-date=2022-10-17 |archive-url=https://web.archive.org/web/20221017101925/https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt |url-status=live }}</ref>
760:
Specification {{!}} Documentation |url=https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt |access-date=2022-10-17 |website=Google
Developers |language=en |archive-date=2022-10-17 |archive-url=https://web.archive.org/web/20221017101925/https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt |url-status=live }}</ref>
767:
The Robots
Exclusion Protocol requires crawlers to parse at least 500 kibibytes (512000 bytes) of robots.txt files,{{Ref RFC|9309|section=2.5: Limits}} which Google maintains as a 500 kibibyte file size restriction for robots.txt files.<ref>{{Cite web |title=How Google Interprets the robots.txt
759:
The Robots
Exclusion Protocol requires crawlers to parse at least 500 kibibytes (512000 bytes) of robots.txt files,{{Ref RFC|9309|section=2.5: Limits}} which Google maintains as a 500 kibibyte file size restriction for robots.txt files.<ref>{{Cite web |title=How Google Interprets the robots.txt
550:
The standard, initially RobotsNotWanted.txt, allowed ]s to specify which bots should not access their website or which pages bots should not access. The internet was small enough in 1994 to maintain a complete list of all bots; ] overload was a primary concern. By June 1994 it had become a ];<ref
542:
The standard, initially RobotsNotWanted.txt, allowed ]s to specify which bots should not access their website or which pages bots should not access. The internet was small enough in 1994 to maintain a complete list of all bots; ] overload was a primary concern. By June 1994 it had become a ];<ref
2097:
The X-Robots-Tag is only effective after the page has been requested and the server responds, and the robots meta tag is only effective after the page has loaded, whereas robots.txt is effective before the page is requested. Thus if a page is excluded by a robots.txt file, any robots meta tags or
1969:
The crawl-delay value is supported by some crawlers to throttle their visits to the host. Since this value is not part of the standard, its interpretation is dependent on the crawler reading it. It is used when the multiple burst of visits from bots is slowing down the host. Yandex interprets the
1955:
User-agent: googlebot # all Google services
Disallow: /private/ # disallow this directory User-agent: googlebot-news # only the news service Disallow: / # disallow everything User-agent: * # any robot Disallow: /something/ # disallow this
1674:
A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be
504:
The standard was proposed by ],<ref>{{cite web |url=http://www.greenhills.co.uk/historical.html |title=Historical |website=Greenhills.co.uk |access-date=2017-03-03 |archive-url=https://web.archive.org/web/20170403152037/http://www.greenhills.co.uk/historical.html |archive-date=2017-04-03
496:
The standard was proposed by ],<ref>{{cite web |url=http://www.greenhills.co.uk/historical.html |title=Historical |website=Greenhills.co.uk |access-date=2017-03-03 |archive-url=https://web.archive.org/web/20170403152037/http://www.greenhills.co.uk/historical.html |archive-date=2017-04-03
1827:(NIST) in the United States specifically recommends against this practice: "System security should not depend on the secrecy of the implementation or its components." In the context of robots.txt files, security through obscurity is not recommended as a security technique.
1819:; it cannot enforce any of what is stated in the file. Malicious web robots are unlikely to honor robots.txt; some may even use the robots.txt as a guide to find disallowed links and go straight to them. While this is sometimes claimed to be a security risk, this sort of
1675:
misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operates on certain data. Links to pages listed in robots.txt can still appear in search results if they are linked to from a page that is crawled.
1473:
Example of a simple robots.txt file, indicating that a user-agent called "Mallorybot" is not allowed to crawl any of the website's pages, and that other user-agents cannot crawl more than one page every 20 seconds, and are not allowed to crawl the "secret"
165:
77:
1806:
s David Pierce said this only began after "training the underlying models that made it so powerful". Also, some bots are used both for search engines and artificial intelligence, and it may be impossible to block only one of these options.
507:
ref>{{cite web|title=Important: Spiders, Robots and Web
Wanderers |first=Martijn |last=Koster |work=www-talk mailing list |date=25 February 1994 |url=http://inkdroid.org/tmp/www-talk/4113.html |format=] archived message |url-status=dead
1970:
value as the number of seconds to wait between subsequent visits. Bing defines crawl-delay as the size of a time window (from 1 to 30 seconds) during which BingBot will access a web site only once. Google provides an interface in its
1664:). This text file contains the instructions in a specific format (see examples below). Robots that choose to follow the instructions try to fetch this file and read the instructions before fetching any other file from the
2039:
and X-Robots-Tag HTTP headers. The robots meta tag cannot be used for non-HTML files such as images, text files, or PDF documents. On the other hand, the X-Robots-Tag can be added to non-HTML files by using
1760:, this followed widespread use of robots.txt to remove historical sites from search engine results, and contrasted with the nonprofit's aim to archive "snapshots" of the internet as it previously existed.
439:'''robots.txt''' is the ] used for implementing the '''Robots Exclusion Protocol''', a standard used by ]s to indicate to visiting ]s and other ] which portions of the website they are allowed to visit.
432:'''robots.txt''' is the ] used for implementing the '''Robots Exclusion Protocol''', a standard used by ]s to indicate to visiting ]s and other ] which portions of the website they are allowed to visit.
2106:
The Robots
Exclusion Protocol requires crawlers to parse at least 500 kibibytes (512000 bytes) of robots.txt files, which Google maintains as a 500 kibibyte file size restriction for robots.txt files.
1671:
A robots.txt file contains instructions for bots indicating which web pages they can and cannot access. Robots.txt files are particularly important for web crawlers from search engines such as Google.
114:
3109:
2651:
15:
2740:
2187:
1750:
said that "unchecked, and left alone, the robots.txt file ensures no mirroring or reference for items that may have general use and meaning beyond the website's context." In 2017, the
121:
2863:
3471:
1780:'s Google-Extended. Many robots.txt files named GPTBot as the only bot explicitly disallowed on all pages. Denying access to GPTBot was common among news websites such as the
3247:
1618:
to specify which bots should not access their website or which pages bots should not access. The internet was small enough in 1994 to maintain a complete list of all bots;
2437:
2369:
2895:
1794:
announced it would deny access to all artificial intelligence web crawlers as "AI companies have leached value from writers in order to spam
Internet readers".
3361:
3299:
3197:
3432:
3105:
2714:
2643:
1945:
It is also possible to list multiple robots with their own rules. The actual robot string is defined by the crawler. A few robot operators, such as
212:
148:
2736:
1949:, support several user-agent strings that allow the operator to deny access to a subset of their services by using specific user-agent strings.
1942:# Comments appear after the "#" symbol at the start of a line, or after a directive User-agent: * # match all bots Disallow: / # keep them out
1824:
2992:
3332:
2803:
1668:. If this file does not exist, web robots assume that the website owner does not wish to place any limitations on crawling the entire site.
2855:
868:* <code>]</code>, a file to describe the process for security researchers to follow in order to report security vulnerabilities
861:* <code>]</code>, a file to describe the process for security researchers to follow in order to report security vulnerabilities
125:
3034:
3277:
3457:
2284:
2833:
2466:
2035:
In addition to root-level robots.txt files, robots exclusion directives can be applied at a more granular level through the use of
3243:
2921:
1564:. Malicious bots can use the file as a directory of which pages to visit, though standards bodies discourage countering this with
1839:
to the web server when fetching content. A web administrator could also configure the server to automatically return failure (or
3084:
2950:
2532:
2561:
2621:
1607:
claims to have provoked Koster to suggest robots.txt, after he wrote a badly behaved web crawler that inadvertently caused a
2429:
2361:
160:
107:
86:
1718:
A robots.txt has no enforcement mechanism in law or in technical protocol, despite widespread compliance by bot operators.
190:
2773:
45:
42:
3512:
2251:
1815:
Despite the use of the terms "allow" and "disallow", the protocol is purely advisory and relies on the compliance of the
194:
2332:
2310:
1769:
1573:
3218:
1936:
User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot User-agent: Googlebot Disallow: /private/
3028:
2885:
2148:
183:
3357:
3303:
2181:
398:| website = {{URL|https://robotstxt.org}}, {{URL|https://datatracker.ietf.org/doc/html/rfc9309|RFC 9309}}
391:| website = {{URL|https://robotstxt.org}}, {{URL|https://datatracker.ietf.org/doc/html/rfc9309|RFC 9309}}
3189:
2401:
1645:
3424:
2681:
1768:
Starting in the 2020s, web operators began using robots.txt to deny access to bots collecting training data for
3134:
2706:
2145:, a file to describe the process for security researchers to follow in order to report security vulnerabilities
1644:
On July 1, 2019, Google announced the proposal of the Robots Exclusion Protocol as an official standard under
172:
2583:
1797:
GPTBot complies with the robots.txt standard and gives advice to web operators about how to disallow it, but
156:
103:
3166:
1820:
1565:
3059:
2098:
X-Robots-Tag headers are effectively ignored because the robot will not see them in the first place.
2973:
2160:
1608:
1391:
1182:
1166:
1148:
1066:
1036:
972:
804:
3324:
2795:
1730:
following this standard include Ask, AOL, Baidu, Bing, DuckDuckGo, Google, Yahoo!, and Yandex.
2856:"Robots.txt meant for search engines don't work well for web archives | Internet Archive Blogs"
1870:
3358:"Robots meta tag and X-Robots-Tag HTTP header specifications - Webmasters — Google Developers"
1682:. For websites with multiple subdomains, each subdomain must have its own robots.txt file. If
3522:
3462:
3018:
1971:
179:
80:
3403:
3269:
2508:
1561:
1656:
When a site owner wishes to give instructions to web robots they place a text file called
8:
3527:
3190:"Is This a Google Easter Egg or Proof That Skynet Is Actually Plotting World Domination?"
2273:
2829:
2458:
1930:
User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot Disallow: /
1568:. Some archival sites ignore robots.txt. The standard was used in the 1990s to mitigate
2917:
1786:
1772:. In 2023, Originality.AI found that 306 of the thousand most-visited websites blocked
1679:
1619:
1569:
3080:
2942:
3517:
3024:
2528:
1623:
1603:
mailing list, the main communication channel for WWW-related activities at the time.
1572:
overload. In the 2020s many websites began denying bots that collect information for
2117:
3393:
2984:
2553:
2498:
2170:
1791:
1751:
16:
2613:
2175:
2165:
2123:
2036:
1889:
This example tells all robots that they can visit all files because the wildcard
3406:
3383:
2511:
2488:
1754:
announced that it would stop complying with robots.txt directives. According to
1525:
2890:
2765:
1756:
1727:
1604:
1592:
208:
Added a book to the "Further reading" section, and performed cleanup.
2274:"Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web"
2243:
1906:
The same result can be accomplished with an empty or missing robots.txt file.
3506:
2644:"Robots Exclusion Protocol: joining together to provide better documentation"
2340:
2218:
1615:
1323:*============================ {{No more links}} ============================+
1316:*============================ {{No more links}} ============================+
2988:
2306:
1933:
This example tells two specific robots not to enter one specific directory:
1854:
file that displays information meant for humans to read. Some sites such as
294:|For Knowledge's robots.txt file, see https://en.wikipedia.org/robots.txt.}}
283:|For Knowledge's robots.txt file, see https://en.wikipedia.org/robots.txt.}}
61:
3222:
2141:
1739:
1554:
852:* <code>]</code>, a standard for listing authorized ad sellers
845:* <code>]</code>, a standard for listing authorized ad sellers
3106:"Deny Strings for Filtering Rules : The Official Microsoft IIS Site"
2825:
2223:
2213:
1878:
1747:
1550:
2886:"The Internet Archive Will Ignore Robots.txt Files to Maintain Accuracy"
2045:
1874:
1836:
1630:
3398:
3300:"Yahoo! Search Blog - Webmasters can now auto-discover with Sitemaps"
2673:
2503:
2406:
2362:"How I got here in the end, part five: "things can only get better!""
2345:
2041:
1975:
1816:
1799:
1698:. In addition, each protocol and port needs its own robots.txt file;
1638:
3425:"How Google Interprets the robots.txt Specification | Documentation"
3130:
2188:
National Digital Information Infrastructure and Preservation Program
1629:; most complied, including those operated by search engines such as
2208:
2203:
2193:
1994:
1921:
This example tells all robots to stay away from one specific file:
1918:
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/
1840:
1648:. A proposed standard was published in September 2022 as RFC 9309.
1580:
1538:
3382:
Koster, M.; Illyes, G.; Zeller, H.; Sassman, L. (September 2022).
2591:
2487:
Koster, M.; Illyes, G.; Zeller, H.; Sassman, L. (September 2022).
2943:"Robots.txt tells hackers the places you don't want them to look"
2198:
2154:
2133:
1743:
1665:
1546:
3156:
3161:
1946:
1855:
1847:
1777:
1773:
3055:
1927:
All other files in the specified directory will be processed.
1915:
This example tells all robots not to enter three directories:
30:
29:
2087:
1634:
1622:
overload was a primary concern. By June 1994 it had become a
1596:
1505:
Gary Illyes, Henner Zeller, Lizzi Sassman (IETF contributors)
1468:
3495:
3389:
2494:
2430:"Robots.txt Celebrates 20 Years Of Blocking Search Engines"
3020:
Innocent Code: A Security Wake-Up Call for Web Programmers
2918:"Block URLs with robots.txt: Learn about robots.txt files"
3467:
3381:
2701:
2699:
2486:
2459:"Formalizing the Robots Exclusion Protocol Specification"
1897:
directive has no value, meaning no pages are disallowed.
1781:
1557:
which portions of the website they are allowed to visit.
1909:
This example tells all robots to stay out of a website:
1843:) when it detects a connection using one of the robots.
1517:
3458:"Artificial Intelligence Web Crawlers Are Running Amok"
3244:"To crawl or not to crawl, that is BingBot's question"
2696:
1579:
The "robots.txt" file can be used in conjunction with
68:
1614:
The standard, initially RobotsNotWanted.txt, allowed
2972:
Scarfone, K. A.; Jansen, W.; Tracy, M. (July 2008).
2729:
2281:
First International Conference on the World Wide Web
2113:
2101:
1448:
3270:"Change Googlebot crawl rate - Search Console Help"
2971:
1776:'s GPTBot in their robots.txt file and 85 blocked
2157:– Now inactive search engine for robots.txt files
2051:
1583:, another robot inclusion standard for websites.
3504:
3236:
3056:"List of User-Agents (Spiders, Robots, Browser)"
3016:
2877:
2760:
2758:
2636:
2576:
2014:
1939:Example demonstrating how comments can be used:
1364:* {{Official website|https://www.robotstxt.org}}
1357:* {{Official website|https://www.robotstxt.org}}
1738:Some web archiving projects ignore robots.txt.
3352:
3350:
2981:National Institute of Standards and Technology
2666:
2546:
2427:
2333:"Important: Spiders, Robots and Web Wanderers"
2137:, a standard for listing authorized ad sellers
1825:National Institute of Standards and Technology
1742:uses the file to discover more links, such as
628:<meta name="robots" content="noindex" />
621:<meta name="robots" content="noindex" />
2755:
1981:User-agent: bingbot Allow: / Crawl-delay: 10
1924:User-agent: * Disallow: /directory/file.html
1865:Previously, Google had a joke file hosted at
1491:1994 published, formally standardized in 2022
2606:
1952:Example demonstrating multiple user-agents:
1660:in the root of the web site hierarchy (e.g.
140:
3347:
2011:Sitemap: http://www.example.com/sitemap.xml
1560:The standard, developed in 1994, relies on
916:inactive search engine for robots.txt files
905:inactive search engine for robots.txt files
3414:sec. 2.5: Limits.
2737:"Submitting your website to Yahoo! Search"
2023:does not mention the "*" character in the
1763:
1467:
27:
3397:
3023:. John Wiley & Sons. pp. 91–92.
2502:
1959:
2421:
2271:
2151:– A failed proposal to extend robots.txt
2030:
1964:
1823:is discouraged by standards bodies. The
1690:did not, the rules that would apply for
28:
2824:
2707:"Webmasters: Robots.txt Specifications"
2395:
2393:
2391:
2389:
2387:
741:===Maximum size of a robots.txt file===
734:===Maximum size of a robots.txt file===
3505:
3187:
2480:
2463:Official Google Webmaster Central Blog
2402:"The text file that runs the internet"
2399:
2330:
1397:
1188:
1172:
1154:
1072:
1042:
978:
810:
664:===A "noindex" HTTP response header===
657:===A "noindex" HTTP response header===
3455:
3335:from the original on November 2, 2019
3137:from the original on January 24, 2017
3081:"Access Control - Apache HTTP Server"
2883:
2684:from the original on 16 February 2017
1447:For Knowledge's robots.txt file, see
2806:from the original on 10 October 2022
2564:from the original on 27 January 2013
2384:
2331:Koster, Martijn (25 February 1994).
892:failed proposal to extend robots.txt
881:failed proposal to extend robots.txt
94:
60:
1449:https://en.wikipedia.org/robots.txt
689:<syntaxhighlight lang="html">
682:<syntaxhighlight lang="html">
612:<syntaxhighlight lang="html">
605:<syntaxhighlight lang="html">
211:
206:
171:
154:
147:
139:
113:
101:
13:
3449:
3375:
2974:"Guide to General Server Security"
2624:from the original on 6 August 2013
2529:"Uncrawled URLs in search results"
2400:Pierce, David (14 February 2024).
1662:https://www.example.com/robots.txt
1574:generative artificial intelligence
1442:
49:
3539:
3487:
3169:from the original on May 30, 2016
2299:
2149:Automated Content Access Protocol
2102:Maximum size of a robots.txt file
1873:not to kill the company founders
1733:
1721:
1440:Revision as of 06:29, 6 July 2024
157:Revision as of 06:29, 6 July 2024
104:Revision as of 01:27, 6 July 2024
3474:from the original on 6 July 2024
2182:National Digital Library Program
2116:
2088:A "noindex" HTTP response header
1835:Many robots also pass a special
1502:Martijn Koster (original author)
374:| title = robots.txt
367:| title = robots.txt
3435:from the original on 2022-10-17
3417:
3364:from the original on 2013-08-08
3317:
3292:
3280:from the original on 2018-11-18
3262:
3250:from the original on 2016-02-03
3211:
3200:from the original on 2018-11-18
3188:Newman, Lily Hay (2014-07-03).
3181:
3149:
3123:
3112:from the original on 2014-01-01
3098:
3087:from the original on 2013-12-29
3073:
3062:from the original on 2014-01-07
3048:
3037:from the original on 2016-04-01
3010:
2998:from the original on 2011-10-08
2965:
2953:from the original on 2015-08-21
2935:
2924:from the original on 2015-08-14
2910:
2898:from the original on 2017-05-16
2866:from the original on 2018-12-04
2848:
2836:from the original on 2017-02-18
2818:
2788:
2776:from the original on 2013-01-25
2743:from the original on 2013-01-21
2717:from the original on 2013-01-15
2654:from the original on 2014-08-18
2535:from the original on 2014-01-06
2521:
2469:from the original on 2019-07-10
2440:from the original on 2015-09-07
2428:Barry Schwartz (30 June 2014).
2372:from the original on 2013-11-25
2313:from the original on 2014-01-12
2287:from the original on 2013-09-27
2254:from the original on 2017-04-03
1974:for webmasters, to control the
1830:
1646:Internet Engineering Task Force
246:description|Internet protocol}}
235:description|Internet protocol}}
2830:"Robots.txt is a suicide note"
2451:
2354:
2324:
2265:
2236:
1893:stands for all robots and the
1702:does not apply to pages under
1:
2884:Jones, Brad (24 April 2017).
2309:. Robotstxt.org. 1994-06-30.
2230:
1993:directive, allowing multiple
1713:
1700:http://example.com/robots.txt
1678:A robots.txt file covers one
1591:The standard was proposed by
1438:
1381:
1344:
1267:
1256:
1247:
1024:
1017:
908:
897:
884:
873:
828:
817:
746:
669:
592:
515:
500:
492:
421:
358:{{Infobox technology standard
351:{{Infobox technology standard
334:
323:
310:
299:
286:
275:
262:
251:
238:
227:
3456:Allyn, Bobby (5 July 2024).
3221:. 2018-01-10. Archived from
18:Browse history interactively
7:
3325:"Robots.txt Specifications"
2554:"About Ask.com: Webmasters"
2109:
1884:
1810:
1651:
10:
3544:
3513:Search engine optimization
2796:"ArchiveBot: Bad behavior"
1984:
1912:User-agent: * Disallow: /
1858:redirect humans.txt to an
1821:security through obscurity
1686:had a robots.txt file but
1586:
1566:security through obscurity
1541:used for implementing the
1446:
587:===A "noindex" meta tag===
580:===A "noindex" meta tag===
3385:Robots Exclusion Protocol
3017:Sverre H. Huseby (2004).
2490:Robots Exclusion Protocol
1900:User-agent: * Disallow:
1543:Robots Exclusion Protocol
1512:
1495:
1487:
1479:
1466:
1462:Robots Exclusion Protocol
1461:
1406:
1386:
1349:
1307:
1304:
1265:
1254:
1245:
1197:
1177:
1145:
1077:
1033:
969:
801:
751:
713:
710:
674:
636:
633:
597:
559:
556:
522:
459:
456:
419:
382:
379:
222:
219:
195:Pending changes reviewers
153:
100:
2531:. YouTube. Oct 5, 2009.
2161:Distributed web crawling
2091:
2055:
2021:Robot Exclusion Standard
1989:Some crawlers support a
1841:pass alternative content
1704:http://example.com:8080/
1609:denial-of-service attack
1599:in February 1994 on the
1549:to indicate to visiting
191:Extended confirmed users
3108:. Iis.net. 2013-11-06.
2989:10.6028/NIST.SP.800-123
1903:User-agent: * Allow: /
1764:Artificial intelligence
99:
2307:"The Web Robots Pages"
2272:Fielding, Roy (1994).
2093:X-Robots-Tag: noindex
1978:'s subsequent visits.
1960:Nonstandard extensions
1074:* ] for search engines
980:* ] for search engines
116:CitationsRuleTheNation
3463:All Things Considered
2337:www-talk mailing list
2031:Meta tags and headers
1965:Crawl-delay directive
1790:. In 2023, blog host
1545:, a standard used by
1421:{{Authority control}}
1414:{{Authority control}}
1260:== Further reading ==
705:X-Robots-Tag: noindex
698:X-Robots-Tag: noindex
144:remove an extra space
3219:"/killer-robots.txt"
3083:. Httpd.apache.org.
2800:wiki.archiveteam.org
2350:on October 29, 2013.
2052:A "noindex" meta tag
1846:Some sites, such as
1708:https://example.com/
1611:on Koster's server.
1562:voluntary compliance
3157:"Github humans.txt"
3131:"Google humans.txt"
3058:. User-agents.org.
2594:on 13 December 2012
2079:"noindex"
2015:Universal "*" match
1694:would not apply to
1595:, when working for
1458:
836:col|colwidth=30em}}
825:col|colwidth=30em}}
812:{{Portal|Internet}}
3412:Proposed Standard.
3274:support.google.com
2766:"Using robots.txt"
2584:"About AOL Search"
2517:Proposed Standard.
2434:Search Engine Land
2178:for search engines
2070:"robots"
1867:/killer-robots.txt
1787:The New York Times
1726:Some major search
1456:
1299:==External links==
1292:==External links==
169:
111:
3429:Google Developers
3329:Google Developers
2862:. 17 April 2017.
2711:Google Developers
2348:archived message)
1532:
1531:
1483:Proposed Standard
1444:Internet protocol
1437:
155:
102:
82:
39:
3535:
3499:
3498:
3496:Official website
3483:
3481:
3479:
3444:
3443:
3441:
3440:
3421:
3415:
3410:
3401:
3399:10.17487/RFC9309
3379:
3373:
3372:
3370:
3369:
3354:
3345:
3344:
3342:
3340:
3321:
3315:
3314:
3312:
3311:
3302:. Archived from
3296:
3290:
3289:
3287:
3285:
3266:
3260:
3259:
3257:
3255:
3240:
3234:
3233:
3231:
3230:
3215:
3209:
3208:
3206:
3205:
3185:
3179:
3178:
3176:
3174:
3153:
3147:
3146:
3144:
3142:
3127:
3121:
3120:
3118:
3117:
3102:
3096:
3095:
3093:
3092:
3077:
3071:
3070:
3068:
3067:
3052:
3046:
3045:
3043:
3042:
3014:
3008:
3007:
3005:
3003:
2997:
2978:
2969:
2963:
2962:
2960:
2958:
2939:
2933:
2932:
2930:
2929:
2914:
2908:
2907:
2905:
2903:
2881:
2875:
2874:
2872:
2871:
2860:blog.archive.org
2852:
2846:
2845:
2843:
2841:
2832:. Archive Team.
2822:
2816:
2815:
2813:
2811:
2802:. Archive Team.
2792:
2786:
2785:
2783:
2781:
2762:
2753:
2752:
2750:
2748:
2733:
2727:
2726:
2724:
2722:
2703:
2694:
2693:
2691:
2689:
2674:"DuckDuckGo Bot"
2670:
2664:
2663:
2661:
2659:
2640:
2634:
2633:
2631:
2629:
2610:
2604:
2603:
2601:
2599:
2590:. Archived from
2580:
2574:
2573:
2571:
2569:
2550:
2544:
2543:
2541:
2540:
2525:
2519:
2515:
2506:
2504:10.17487/RFC9309
2484:
2478:
2477:
2475:
2474:
2455:
2449:
2448:
2446:
2445:
2425:
2419:
2418:
2416:
2414:
2397:
2382:
2381:
2379:
2377:
2368:. 19 June 2006.
2358:
2352:
2351:
2349:
2339:. Archived from
2328:
2322:
2321:
2319:
2318:
2303:
2297:
2296:
2294:
2292:
2278:
2269:
2263:
2262:
2260:
2259:
2248:Greenhills.co.uk
2240:
2171:Internet Archive
2144:
2136:
2126:
2121:
2120:
2083:
2080:
2077:
2074:
2071:
2068:
2065:
2062:
2059:
2037:Robots meta tags
2026:
2007:
2000:
1992:
1896:
1892:
1868:
1853:
1805:
1752:Internet Archive
1709:
1705:
1701:
1697:
1693:
1689:
1685:
1663:
1659:
1528:
1522:
1519:
1471:
1459:
1455:
1240:{{Reflist|30em}}
1233:{{Reflist|30em}}
209:
201:
187:
168:
163:
145:
142:
134:
129:
110:
83:
74:
73:
71:
66:
64:
56:
53:
32:
31:
21:
19:
3543:
3542:
3538:
3537:
3536:
3534:
3533:
3532:
3503:
3502:
3494:
3493:
3490:
3477:
3475:
3452:
3450:Further reading
3447:
3438:
3436:
3423:
3422:
3418:
3380:
3376:
3367:
3365:
3356:
3355:
3348:
3338:
3336:
3323:
3322:
3318:
3309:
3307:
3298:
3297:
3293:
3283:
3281:
3268:
3267:
3263:
3253:
3251:
3242:
3241:
3237:
3228:
3226:
3217:
3216:
3212:
3203:
3201:
3186:
3182:
3172:
3170:
3155:
3154:
3150:
3140:
3138:
3129:
3128:
3124:
3115:
3113:
3104:
3103:
3099:
3090:
3088:
3079:
3078:
3074:
3065:
3063:
3054:
3053:
3049:
3040:
3038:
3031:
3015:
3011:
3001:
2999:
2995:
2976:
2970:
2966:
2956:
2954:
2941:
2940:
2936:
2927:
2925:
2916:
2915:
2911:
2901:
2899:
2882:
2878:
2869:
2867:
2854:
2853:
2849:
2839:
2837:
2823:
2819:
2809:
2807:
2794:
2793:
2789:
2779:
2777:
2770:Help.yandex.com
2764:
2763:
2756:
2746:
2744:
2735:
2734:
2730:
2720:
2718:
2705:
2704:
2697:
2687:
2685:
2672:
2671:
2667:
2657:
2655:
2642:
2641:
2637:
2627:
2625:
2612:
2611:
2607:
2597:
2595:
2582:
2581:
2577:
2567:
2565:
2552:
2551:
2547:
2538:
2536:
2527:
2526:
2522:
2485:
2481:
2472:
2470:
2457:
2456:
2452:
2443:
2441:
2426:
2422:
2412:
2410:
2398:
2385:
2375:
2373:
2366:Charlie's Diary
2360:
2359:
2355:
2343:
2329:
2325:
2316:
2314:
2305:
2304:
2300:
2290:
2288:
2276:
2270:
2266:
2257:
2255:
2242:
2241:
2237:
2233:
2228:
2166:Focused crawler
2140:
2132:
2124:Internet portal
2122:
2115:
2112:
2104:
2095:
2094:
2090:
2085:
2084:
2081:
2078:
2075:
2072:
2069:
2066:
2063:
2060:
2057:
2054:
2033:
2024:
2017:
2012:
2002:
1998:
1990:
1987:
1982:
1967:
1962:
1957:
1943:
1937:
1931:
1925:
1919:
1913:
1904:
1901:
1894:
1890:
1887:
1866:
1851:
1833:
1813:
1803:
1766:
1736:
1724:
1716:
1707:
1703:
1699:
1695:
1691:
1687:
1683:
1661:
1657:
1654:
1589:
1524:
1516:
1508:
1488:First published
1475:
1452:
1445:
1434:
1429:
1422:
1415:
1404:
1402:
1384:
1377:
1372:
1365:
1358:
1347:
1340:
1333:
1324:
1317:
1300:
1293:
1284:
1279:
1272:
1261:
1250:
1241:
1234:
1225:
1218:
1209:
1204:
1195:
1193:
1175:
1161:
1159:
1141:
1134:
1125:
1118:
1109:
1102:
1093:
1086:
1075:
1061:
1054:
1045:
1029:
1022:
1013:
1006:
997:
990:
981:
965:
958:
949:
942:
933:
926:
917:
915:
906:
904:
893:
891:
882:
880:
869:
862:
853:
846:
837:
835:
826:
824:
813:
797:
790:
781:
776:
769:
761:
749:
742:
735:
726:
721:
706:
699:
690:
683:
672:
665:
658:
649:
644:
629:
622:
613:
606:
595:
588:
581:
572:
567:
552:
544:
534:
529:
520:
511:
509:
498:
488:
481:
472:
467:
452:
447:
440:
433:
424:
415:
408:
399:
392:
375:
368:
359:
352:
343:
342:-pc|small=yes}}
341:
332:
331:-pc|small=yes}}
330:
319:
317:
308:
306:
295:
293:
284:
282:
271:
269:
260:
258:
247:
245:
236:
234:
215:
210:
207:
205:
204:
203:
199:
197:
177:
175:
170:
164:
159:
151:
149:← Previous edit
146:
143:
138:
137:
136:
132:
119:
117:
112:
106:
98:
97:
96:
95:
93:
92:
91:
90:
89:
88:
79:
75:
69:
67:
62:
59:
57:
54:
52:Content deleted
51:
48:
43:← Previous edit
40:
26:
25:
24:
17:
12:
11:
5:
3541:
3531:
3530:
3525:
3520:
3515:
3501:
3500:
3489:
3488:External links
3486:
3485:
3484:
3451:
3448:
3446:
3445:
3416:
3374:
3346:
3316:
3291:
3261:
3246:. 3 May 2012.
3235:
3210:
3194:Slate Magazine
3180:
3148:
3122:
3097:
3072:
3047:
3029:
3009:
2964:
2934:
2909:
2891:Digital Trends
2876:
2847:
2817:
2787:
2754:
2728:
2695:
2678:DuckDuckGo.com
2665:
2648:Blogs.bing.com
2635:
2605:
2588:Search.aol.com
2575:
2545:
2520:
2479:
2450:
2420:
2383:
2353:
2323:
2298:
2264:
2234:
2232:
2229:
2227:
2226:
2221:
2216:
2211:
2206:
2201:
2196:
2191:
2185:
2179:
2173:
2168:
2163:
2158:
2152:
2146:
2138:
2129:
2128:
2127:
2111:
2108:
2103:
2100:
2092:
2089:
2086:
2056:
2053:
2050:
2032:
2029:
2016:
2013:
2010:
1986:
1983:
1980:
1972:search console
1966:
1963:
1961:
1958:
1954:
1941:
1935:
1929:
1923:
1917:
1911:
1902:
1899:
1886:
1883:
1871:the Terminator
1832:
1829:
1812:
1809:
1765:
1762:
1757:Digital Trends
1735:
1734:Archival sites
1732:
1723:
1722:Search engines
1720:
1715:
1712:
1653:
1650:
1616:web developers
1605:Charles Stross
1593:Martijn Koster
1588:
1585:
1530:
1529:
1514:
1510:
1509:
1507:
1506:
1503:
1499:
1497:
1493:
1492:
1489:
1485:
1484:
1481:
1477:
1476:
1472:
1464:
1463:
1443:
1441:
1436:
1435:
1432:
1430:
1427:
1424:
1423:
1420:
1418:
1416:
1413:
1411:
1408:
1407:
1405:
1400:
1396:
1394:
1388:
1387:
1385:
1382:
1379:
1378:
1375:
1373:
1370:
1367:
1366:
1363:
1361:
1359:
1356:
1354:
1351:
1350:
1348:
1345:
1342:
1341:
1338:
1336:
1334:
1331:
1329:
1326:
1325:
1322:
1320:
1318:
1315:
1313:
1310:
1309:
1306:
1302:
1301:
1298:
1296:
1294:
1291:
1289:
1286:
1285:
1282:
1280:
1277:
1274:
1273:
1270:
1268:
1266:
1263:
1262:
1259:
1257:
1255:
1252:
1251:
1248:
1246:
1243:
1242:
1239:
1237:
1235:
1232:
1230:
1227:
1226:
1224:==References==
1223:
1221:
1219:
1217:==References==
1216:
1214:
1211:
1210:
1207:
1205:
1202:
1199:
1198:
1196:
1191:
1187:
1185:
1179:
1178:
1176:
1171:
1169:
1163:
1162:
1157:
1153:
1151:
1146:
1143:
1142:
1139:
1137:
1135:
1132:
1130:
1127:
1126:
1123:
1121:
1119:
1116:
1114:
1111:
1110:
1107:
1105:
1103:
1100:
1098:
1095:
1094:
1091:
1089:
1087:
1084:
1082:
1079:
1078:
1076:
1071:
1069:
1063:
1062:
1059:
1057:
1055:
1052:
1050:
1047:
1046:
1041:
1039:
1034:
1031:
1030:
1027:
1025:
1023:
1020:
1018:
1015:
1014:
1011:
1009:
1007:
1004:
1002:
999:
998:
995:
993:
991:
988:
986:
983:
982:
977:
975:
970:
967:
966:
963:
961:
959:
956:
954:
951:
950:
947:
945:
943:
940:
938:
935:
934:
931:
929:
927:
924:
922:
919:
918:
913:
911:
909:
907:
902:
900:
898:
895:
894:
889:
887:
885:
883:
878:
876:
874:
871:
870:
867:
865:
863:
860:
858:
855:
854:
851:
849:
847:
844:
842:
839:
838:
833:
831:
829:
827:
822:
820:
818:
815:
814:
809:
807:
802:
799:
798:
795:
793:
791:
788:
786:
783:
782:
779:
777:
774:
771:
770:
766:
764:
762:
758:
756:
753:
752:
750:
747:
744:
743:
740:
738:
736:
733:
731:
728:
727:
724:
722:
719:
716:
715:
712:
708:
707:
704:
702:
700:
697:
695:
692:
691:
688:
686:
684:
681:
679:
676:
675:
673:
670:
667:
666:
663:
661:
659:
656:
654:
651:
650:
647:
645:
642:
639:
638:
635:
631:
630:
627:
625:
623:
620:
618:
615:
614:
611:
609:
607:
604:
602:
599:
598:
596:
593:
590:
589:
586:
584:
582:
579:
577:
574:
573:
570:
568:
565:
562:
561:
558:
554:
553:
549:
547:
545:
541:
539:
536:
535:
532:
530:
527:
524:
523:
521:
518:
516:
513:
512:
506:
503:
501:
499:
495:
493:
490:
489:
486:
484:
482:
479:
477:
474:
473:
470:
468:
465:
462:
461:
458:
454:
453:
450:
448:
445:
442:
441:
438:
436:
434:
431:
429:
426:
425:
422:
420:
417:
416:
413:
411:
409:
406:
404:
401:
400:
397:
395:
393:
390:
388:
385:
384:
381:
377:
376:
373:
371:
369:
366:
364:
361:
360:
357:
355:
353:
350:
348:
345:
344:
339:
337:
335:
333:
328:
326:
324:
321:
320:
315:
313:
311:
309:
304:
302:
300:
297:
296:
291:
289:
287:
285:
280:
278:
276:
273:
272:
267:
265:
263:
261:
256:
254:
252:
249:
248:
243:
241:
239:
237:
232:
230:
228:
225:
224:
221:
217:
216:
198:
189:
188:
173:
152:
131:
130:
115:
84:
78:
76:
58:
50:
41:
38:
37:
35:
23:
22:
14:
9:
6:
4:
3:
2:
3540:
3529:
3526:
3524:
3521:
3519:
3516:
3514:
3511:
3510:
3508:
3497:
3492:
3491:
3473:
3469:
3465:
3464:
3459:
3454:
3453:
3434:
3430:
3426:
3420:
3413:
3408:
3405:
3400:
3395:
3391:
3387:
3386:
3378:
3363:
3359:
3353:
3351:
3334:
3330:
3326:
3320:
3306:on 2009-03-05
3305:
3301:
3295:
3279:
3275:
3271:
3265:
3249:
3245:
3239:
3225:on 2018-01-10
3224:
3220:
3214:
3199:
3195:
3191:
3184:
3168:
3164:
3163:
3158:
3152:
3136:
3132:
3126:
3111:
3107:
3101:
3086:
3082:
3076:
3061:
3057:
3051:
3036:
3032:
3030:9780470857472
3026:
3022:
3021:
3013:
2994:
2990:
2986:
2982:
2975:
2968:
2952:
2948:
2944:
2938:
2923:
2919:
2913:
2897:
2893:
2892:
2887:
2880:
2865:
2861:
2857:
2851:
2835:
2831:
2827:
2821:
2805:
2801:
2797:
2791:
2775:
2771:
2767:
2761:
2759:
2742:
2738:
2732:
2716:
2712:
2708:
2702:
2700:
2683:
2679:
2675:
2669:
2653:
2649:
2645:
2639:
2623:
2619:
2615:
2614:"Baiduspider"
2609:
2593:
2589:
2585:
2579:
2563:
2559:
2558:About.ask.com
2555:
2549:
2534:
2530:
2524:
2518:
2513:
2510:
2505:
2500:
2496:
2492:
2491:
2483:
2468:
2464:
2460:
2454:
2439:
2435:
2431:
2424:
2409:
2408:
2403:
2396:
2394:
2392:
2390:
2388:
2371:
2367:
2363:
2357:
2347:
2342:
2338:
2334:
2327:
2312:
2308:
2302:
2291:September 25,
2286:
2282:
2275:
2268:
2253:
2249:
2245:
2239:
2235:
2225:
2222:
2220:
2219:Web archiving
2217:
2215:
2212:
2210:
2207:
2205:
2202:
2200:
2197:
2195:
2192:
2189:
2186:
2183:
2180:
2177:
2176:Meta elements
2174:
2172:
2169:
2167:
2164:
2162:
2159:
2156:
2153:
2150:
2147:
2143:
2139:
2135:
2131:
2130:
2125:
2119:
2114:
2107:
2099:
2049:
2047:
2043:
2038:
2028:
2022:
2009:
2006:
1996:
1979:
1977:
1973:
1953:
1950:
1948:
1940:
1934:
1928:
1922:
1916:
1910:
1907:
1898:
1882:
1880:
1876:
1872:
1863:
1861:
1857:
1849:
1844:
1842:
1838:
1828:
1826:
1822:
1818:
1808:
1802:
1801:
1795:
1793:
1789:
1788:
1783:
1779:
1775:
1771:
1770:generative AI
1761:
1759:
1758:
1753:
1749:
1746:. Co-founder
1745:
1741:
1731:
1729:
1719:
1711:
1696:a.example.com
1688:a.example.com
1681:
1676:
1672:
1669:
1667:
1649:
1647:
1642:
1640:
1636:
1632:
1628:
1626:
1621:
1617:
1612:
1610:
1606:
1602:
1598:
1594:
1584:
1582:
1577:
1575:
1571:
1567:
1563:
1558:
1556:
1552:
1548:
1544:
1540:
1536:
1527:
1521:
1515:
1511:
1504:
1501:
1500:
1498:
1494:
1490:
1486:
1482:
1478:
1470:
1465:
1460:
1454:
1450:
1439:
1433:
1431:
1428:
1426:
1425:
1419:
1417:
1412:
1410:
1409:
1398:
1395:
1393:
1390:
1389:
1383:
1380:
1376:
1374:
1371:
1369:
1368:
1362:
1360:
1355:
1353:
1352:
1346:
1343:
1337:
1335:
1330:
1328:
1327:
1321:
1319:
1314:
1312:
1311:
1303:
1297:
1295:
1290:
1288:
1287:
1283:
1281:
1278:
1276:
1275:
1269:
1264:
1258:
1253:
1249:
1244:
1238:
1236:
1231:
1229:
1228:
1222:
1220:
1215:
1213:
1212:
1208:
1206:
1203:
1201:
1200:
1189:
1186:
1184:
1181:
1180:
1173:
1170:
1168:
1165:
1164:
1155:
1152:
1150:
1147:
1144:
1138:
1136:
1131:
1129:
1128:
1122:
1120:
1115:
1113:
1112:
1106:
1104:
1099:
1097:
1096:
1090:
1088:
1083:
1081:
1080:
1073:
1070:
1068:
1065:
1064:
1058:
1056:
1051:
1049:
1048:
1043:
1040:
1038:
1035:
1032:
1026:
1019:
1016:
1010:
1008:
1003:
1001:
1000:
994:
992:
987:
985:
984:
979:
976:
974:
971:
968:
962:
960:
955:
953:
952:
946:
944:
939:
937:
936:
930:
928:
923:
921:
920:
910:
899:
896:
886:
875:
872:
866:
864:
859:
857:
856:
850:
848:
843:
841:
840:
830:
819:
816:
811:
808:
806:
803:
800:
794:
792:
787:
785:
784:
780:
778:
775:
773:
772:
765:
763:
757:
755:
754:
748:
745:
739:
737:
732:
730:
729:
725:
723:
720:
718:
717:
709:
703:
701:
696:
694:
693:
687:
685:
680:
678:
677:
671:
668:
662:
660:
655:
653:
652:
648:
646:
643:
641:
640:
632:
626:
624:
619:
617:
616:
610:
608:
603:
601:
600:
594:
591:
585:
583:
578:
576:
575:
571:
569:
566:
564:
563:
555:
548:
546:
540:
538:
537:
533:
531:
528:
526:
525:
517:
514:
502:
494:
491:
485:
483:
478:
476:
475:
471:
469:
466:
464:
463:
455:
451:
449:
446:
444:
443:
437:
435:
430:
428:
427:
423:
418:
412:
410:
405:
403:
402:
396:
394:
389:
387:
386:
378:
372:
370:
365:
363:
362:
356:
354:
349:
347:
346:
336:
325:
322:
312:
301:
298:
288:
277:
274:
264:
253:
250:
240:
229:
226:
218:
214:
196:
192:
185:
181:
176:
167:
162:
158:
150:
127:
123:
118:
109:
105:
87:
72:
65:
55:Content added
47:
44:
36:
34:
33:
20:
3523:Web scraping
3476:. Retrieved
3461:
3437:. Retrieved
3428:
3419:
3411:
3384:
3377:
3366:. Retrieved
3339:February 15,
3337:. Retrieved
3328:
3319:
3308:. Retrieved
3304:the original
3294:
3282:. Retrieved
3273:
3264:
3252:. Retrieved
3238:
3227:. Retrieved
3223:the original
3213:
3202:. Retrieved
3193:
3183:
3171:. Retrieved
3160:
3151:
3139:. Retrieved
3125:
3114:. Retrieved
3100:
3089:. Retrieved
3075:
3064:. Retrieved
3050:
3039:. Retrieved
3019:
3012:
3000:. Retrieved
2980:
2967:
2955:. Retrieved
2947:The Register
2946:
2937:
2926:. Retrieved
2912:
2900:. Retrieved
2889:
2879:
2868:. Retrieved
2859:
2850:
2838:. Retrieved
2820:
2808:. Retrieved
2799:
2790:
2778:. Retrieved
2769:
2745:. Retrieved
2731:
2719:. Retrieved
2710:
2686:. Retrieved
2677:
2668:
2656:. Retrieved
2647:
2638:
2626:. Retrieved
2617:
2608:
2596:. Retrieved
2592:the original
2587:
2578:
2566:. Retrieved
2557:
2548:
2537:. Retrieved
2523:
2516:
2489:
2482:
2471:. Retrieved
2462:
2453:
2442:. Retrieved
2433:
2423:
2411:. Retrieved
2405:
2374:. Retrieved
2365:
2356:
2341:the original
2336:
2326:
2315:. Retrieved
2301:
2289:. Retrieved
2280:
2277:(PostScript)
2267:
2256:. Retrieved
2247:
2244:"Historical"
2238:
2142:security.txt
2105:
2096:
2034:
2020:
2018:
2004:
2001:in the form
1997:in the same
1988:
1968:
1951:
1944:
1938:
1932:
1926:
1920:
1914:
1908:
1905:
1888:
1869:instructing
1864:
1859:
1845:
1834:
1831:Alternatives
1814:
1798:
1796:
1785:
1767:
1755:
1740:Archive Team
1737:
1725:
1717:
1677:
1673:
1670:
1655:
1643:
1624:
1613:
1600:
1590:
1578:
1559:
1551:web crawlers
1542:
1534:
1533:
1453:
1012:* ] (NDIIPP)
1005:* ] (NDIIPP)
796:==See also==
789:==See also==
2840:18 February
2826:Jason Scott
2780:16 February
2747:16 February
2721:16 February
2658:16 February
2628:16 February
2598:16 February
2568:16 February
2224:Web crawler
2214:Spider trap
2027:statement.
1879:Sergey Brin
1748:Jason Scott
1692:example.com
1684:example.com
1403:|Internet}}
487:==History==
480:==History==
213:Next edit →
174:DocWatson42
46:Next edit →
3528:Text files
3507:Categories
3439:2022-10-17
3368:2013-08-17
3310:2009-03-23
3284:22 October
3254:9 February
3229:2018-05-25
3204:2019-10-03
3173:October 3,
3141:October 3,
3116:2013-12-29
3091:2013-12-29
3066:2013-12-29
3041:2015-08-12
3002:August 12,
2957:August 12,
2928:2015-08-10
2870:2018-12-01
2810:10 October
2539:2013-12-29
2473:2019-07-10
2444:2015-11-19
2317:2013-12-29
2283:. Geneva.
2258:2017-03-03
2231:References
2046:httpd.conf
1999:robots.txt
1956:directory
1875:Larry Page
1852:humans.txt
1837:user-agent
1714:Compliance
1658:robots.txt
1631:WebCrawler
1555:web robots
1553:and other
1535:robots.txt
1457:robots.txt
996:* ] (NDLP)
989:* ] (NDLP)
2618:Baidu.com
2407:The Verge
2346:Hypermail
2042:.htaccess
2025:Disallow:
2003:Sitemap:
1976:Googlebot
1850:, host a
1817:web robot
1800:The Verge
1639:AltaVista
1518:robotstxt
1308:Line 234:
1305:Line 233:
1194:col end}}
1160:col end}}
714:Line 195:
711:Line 197:
637:Line 188:
634:Line 189:
560:Line 183:
557:Line 183:
268:Lowercase
257:lowercase
3518:Websites
3472:Archived
3433:Archived
3362:Archived
3333:Archived
3278:Archived
3248:Archived
3198:Archived
3167:Archived
3135:Archived
3110:Archived
3085:Archived
3060:Archived
3035:Archived
2993:Archived
2951:Archived
2922:Archived
2896:Archived
2864:Archived
2834:Archived
2804:Archived
2774:Archived
2741:Archived
2715:Archived
2688:25 April
2682:Archived
2652:Archived
2622:Archived
2562:Archived
2533:Archived
2467:Archived
2438:Archived
2413:16 March
2376:19 April
2370:Archived
2311:Archived
2285:Archived
2252:Archived
2209:Sitemaps
2204:Perma.cc
2194:nofollow
2190:(NDIIPP)
2110:See also
2005:full-url
1995:Sitemaps
1895:Disallow
1885:Examples
1811:Security
1744:sitemaps
1652:Standard
1627:standard
1625:de facto
1601:www-talk
1581:sitemaps
1547:websites
1539:filename
1526:RFC 9309
1399:{{Portal
1392:⚫
1183:⚫
1167:⚫
1149:⚫
1067:⚫
1037:⚫
973:⚫
805:⚫
460:Line 46:
457:Line 45:
383:Line 38:
380:Line 38:
184:contribs
126:contribs
70:Wikitext
2199:noindex
2155:BotSeer
2134:ads.txt
2073:content
2048:files.
1991:Sitemap
1985:Sitemap
1728:engines
1666:website
1587:History
1537:is the
1513:Website
1496:Authors
1474:folder.
510:ref>
292:Selfref
281:selfref
223:Line 1:
220:Line 1:
200:212,198
3478:6 July
3162:GitHub
3027:
2184:(NDLP)
1947:Google
1862:page.
1856:GitHub
1848:Google
1792:Medium
1778:Google
1774:OpenAI
1680:origin
1637:, and
1620:server
1570:server
1480:Status
1339:-->
1332:-->
912:* ] –
901:* ] –
888:* ] –
877:* ] –
318:-pc1}}
307:-pc1}}
81:Inline
63:Visual
2996:(PDF)
2977:(PDF)
2902:8 May
2082:/>
1860:About
1804:'
1635:Lycos
1597:Nexor
244:Short
233:short
202:edits
135:edits
3480:2024
3407:9309
3390:IETF
3341:2020
3286:2018
3256:2016
3175:2019
3143:2019
3025:ISBN
3004:2015
2959:2015
2904:2017
2842:2017
2812:2022
2782:2013
2749:2013
2723:2013
2690:2017
2660:2013
2630:2013
2600:2013
2570:2013
2512:9309
2495:IETF
2415:2024
2378:2014
2293:2013
2064:name
2061:meta
2058:<
2044:and
2019:The
1877:and
1784:and
1520:.org
180:talk
166:undo
161:edit
122:talk
108:edit
3468:NPR
3404:RFC
3394:doi
2985:doi
2509:RFC
2499:doi
1782:BBC
1706:or
1401:bar
1192:div
1174:* ]
1158:Div
1140:* ]
1133:* ]
1124:* ]
1117:* ]
1108:* ]
1101:* ]
1092:* ]
1085:* ]
1060:* ]
1053:* ]
1044:* ]
1028:* ]
1021:* ]
964:* ]
957:* ]
948:* ]
941:* ]
932:* ]
925:* ]
914:Now
903:now
834:Div
823:div
3509::
3470:.
3466:.
3460:.
3431:.
3427:.
3402:.
3392:.
3388:.
3360:.
3349:^
3331:.
3327:.
3276:.
3272:.
3196:.
3192:.
3165:.
3159:.
3133:.
3033:.
2991:.
2983:.
2979:.
2949:.
2945:.
2920:.
2894:.
2888:.
2858:.
2828:.
2798:.
2772:.
2768:.
2757:^
2739:.
2713:.
2709:.
2698:^
2680:.
2676:.
2650:.
2646:.
2620:.
2616:.
2586:.
2560:.
2556:.
2507:.
2497:.
2493:.
2465:.
2461:.
2436:.
2432:.
2404:.
2386:^
2364:.
2335:.
2279:.
2250:.
2246:.
2008::
1881:.
1710:.
1641:.
1633:,
1576:.
1523:,
1190:{{
1156:{{
832:{{
821:{{
414:}}
407:}}
340:Pp
338:{{
329:pp
327:{{
316:Pp
314:{{
305:pp
303:{{
290:{{
279:{{
270:}}
266:{{
259:}}
255:{{
242:{{
231:{{
193:,
182:|
124:|
3482:.
3442:.
3409:.
3396::
3371:.
3343:.
3313:.
3288:.
3258:.
3232:.
3207:.
3177:.
3145:.
3119:.
3094:.
3069:.
3044:.
3006:.
2987::
2961:.
2931:.
2906:.
2873:.
2844:.
2814:.
2784:.
2751:.
2725:.
2692:.
2662:.
2632:.
2602:.
2572:.
2542:.
2514:.
2501::
2476:.
2447:.
2417:.
2380:.
2344:(
2320:.
2295:.
2261:.
2076:=
2067:=
1891:*
1451:.
890:A
879:a
186:)
178:(
141:m
133:5
128:)
120:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.