From Wikipedia:
The
robots exclusion standard or
robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable.
This part is not a complete introduction to the use and benefit of the file
robots.txt which is a good tool to control bots and spiders (beside other purposes), it is a short introduction and lists usefull directives.
You can set different directives in that file, which must be placed in the root of your website (edit it with an ASCII-editor and upload it in ASCII-modus to your webspace)
As it would make no sense to block your website for all bots, indexing robots and search machines, it does make sense to block some of them explicitely
To stop
Microsoft Search (Windows Live) to crawl your site completley, you can add this:
Quote:
User-Agent: MSNBot
Disallow:/
|
To stop
Microsoft Search (Windows Live) to crawl your website like a amok-running idiot, you can add this to slow it down..
Quote:
User-Agent: MSNBot
Crawl-Delay: 36000
|
and another exotic directive especially for
Micros&ft, to block
Microsoft Search to show your website as a website preview, add this to your
robots.txt:
Quote:
User-agent: searchpreview
Disallow:/
|
To stop
Google Bot to index your page completely:
Quote:
User-agent: Googlebot
Disallow: /
|
To block
all Bots to index the images- and the thumbnail-folder, set these:
Quote:
User-agent: *
Disallow: /images
Disallow: /thumbnails
|