welcome

Have a new tricks fun of world!

Tuesday 12 June 2012

part 2 robots.txt file

Where can I find user agent names?


You can find user agent names in your log files by checking for requests to robots.txt. Most often, all search engine spiders should be given the same rights. in that case, use "User-agent: *" as mentioned above.


Things you should avoid


If you don't format your robots.txt file properly, some or all files of your Web site might not get indexed by search engines. To avoid this, do the following:
Don't use comments in the robots.txt file
Although comments are allowed in a robots.txt file, they might confuse some search engine spiders.


"Disallow: support # Don't index the support directory" might be misinterepreted as "Disallow: support#Don't index the support directory".


Don't use white space at the beginning of a line. For example, don't write


placeholder User-agent: *
place Disallow: /support


but


User-agent: *
Disallow: /support


Don't change the order of the commands. If your robots.txt file should work, don't mix it up. Don't write


Disallow: /support
User-agent: *


but


User-agent: *
Disallow: /support
Don't use more than one directory in a Disallow line. Do not use the following


User-agent: *
Disallow: /support /cgi-bin/ /images/
Search engine spiders cannot understand that format. The correct syntax for this is


User-agent: *
Disallow: /support
Disallow: /cgi-bin/
Disallow: /images/


Be sure to use the right case. The file names on your server are case sensitve. If the name of your directory is "Support", don't write "support" in the robots.txt file.


Don't list all files. If you want a search engine spider to ignore all files in a special directory, you don't have to list all files. For example:


User-agent: *
Disallow: /support/orders.html
Disallow: /support/technical.html
Disallow: /support/helpdesk.html
Disallow: /support/index.html


You can replace this with


User-agent: *
Disallow: /support


There is no "Allow" command
Don't use an "Allow" command in your robots.txt file. Only mention files and directories that you don't want to be indexed. All other files will be indexed automatically if they are linked on your site.

No comments:

Post a Comment