June 13th, 2008


More Information on the new Robots Exclusion Protocol

More Information on the new Robots Exclusion Protocol

Search indexing robot writers and web publishers should definitely look at the new extensions to the REP, as there are useful additions to both robots.txt directives and Robots META tags. Most of these features have been supported by the big three search engines (Google, Yahoo, MSN Live), but it's nice to have that formalized, and other search robots can take advantage of the new functionality.

The new X-Robots-Tag (added to the HTTP header for non-HTML files) is a good way to send the meta information, but requires automated extensions to the servers. For example, if content is available in both HTML and PDF formats, it's easy to send NOINDEX values for all PDF, directing search engines away from the printable format and towards the browser-readable format.

It turns out that NOODP comes in handy when a page is linked from the ODP (Open Directory Project), and the title or text in that entry is not accurate, which happens sometimes. Using the NOODP robots meta tag value tells the search engines not to use the ODP entry, but rather the title and text from the page. NOYDIR does the same for the Yahoo Directory, but is only officially supported by Yahoo and its Slurp robot.

For pages with frequent changes, NOARCHIVE makes some sense: the old content may be in the searchable index, but at least the search engines will not display the old version of the page itself.

However, I have yet to figure out when someone would use NOSNIPPET (which also disables archive display). Limiting a listing in the search results page to the title and URL seems like such a bad idea. Why would anyone do this?

  • Current Mood