Tuesday, November 30, 2010

code.google.com: Controlling Crawling and Indexing

Controlling Crawling and Indexing



This methods automated website crawlers are powerful tools to help crawl and index content on the web. As a webmaster, you may wish to guide them towards your helpful content and absent from irrelevant content. Described in these documents are the de-facto web-wide standards to control crawling and indexing of web-based content. They consist of the robots.txt file to control crawling, as well as the robots Meta tag and X-Robots-Tag HTTP header element to control indexing. The robots.txt standard predates Google and is the accepted method of controlling crawling of a website.

This document represents the current usage of the robots.txt web-crawler control information as well as indexing directives as they are used at Google. This information is generally supported by all major web-crawlers and search engines.more...

1 comment: