Google makes robots.txt parser open source to make it internet standard

Google has made the parser of its robots.txt protocol and its libraries open source. In this way, the company hopes that website builders and companies will ensure that the Robots Exclusion Protocol becomes an official web standard.

All files surrounding the REP have been put on GitHub. It concerns the library that controls the Robots Exclusion Protocol in Googlebot, the crawler that indexes websites for the search engine. Google will be asking developers for their input on the plans in the near future.

By making the libraries open source, Google hopes that the REP can be declared an official internet standard. Until now, it wasn’t, so developers could always interpret the protocol slightly differently. As a result, problems can arise when code is written in certain processors or operating systems, which then work slightly differently. To avoid such things, for example, Google has included several ways in its code that the disallow command can be passed to robots.txt.

To ensure that REP becomes an official standard, Google has also provided the documentation to the Internet Engineering Task Force, the organization that deals with standardization of Internet protocols. For example, the documentation contains requirements about maximum cache time, minimum requirements for servers and the maximum size of a REP file. The standard not only talks about robots.txt, but also transfer protocols such as FTP and CoAP.