robots.txt
robots.txt
Specifies to search robots what catalogs to take for indexing should not be. If it is empty or does not exist, then everything can be taken.
List of search engine robotsSpecifies to search robots what catalogs to take for indexing should not be. If it is empty or does not exist, then everything can be taken.
Generate robots.txt file The search engines always look for a file called "robots.txt" in the root directory of your domain (http://www.mydomain.com/robots.txt) .
This file tells the robots (spiders-indexers) what files they can index and which ones they do not. robots.txt consists of two fields:
- User-agent is the name of the robot,
- Disallow - prohibits the indexing of a file or directory.
- comments - start with a new line with #.
Rules
EditorsRobots.txt should be created in text format.
As an editor you can use notepad, FTP client, some HTML-editors. Title
Robots.txt, not robot.txt or Robots.txt, otherwise it will not work. Location
The robots.txt file should be located in the root directory. Spaces
Spaces do not matter. Comments
Comments - start with a new line with #. A space after # is optional. Order
1st line User-agent, which defines the robot,
And bynext Disallow specifies a file or folder that is not indexed. If the ban refers to a number of robots, then they are written one by one separately, and then a ban or list of prohibitions, for example: User-agent: StackRambler
User-agent: Aport
Disallow:/eng
Disallow:/news
#Rambler and Aport to disallow the indexing of links,
#which begin with/news and/eng The same and for Disallow - every ban with a new line. If for different robots different prohibitions, then they are separated by an empty string, for example: User-agent: *
Disallow:/news
# disallow for all the indexing of links,
#which begin with/news User-agent: StackRambler
User-agent: Aport
Disallow:/eng
Disallow:/news
#Rambler and Aport to disallow the indexing of links,
#which begin with/news and/eng User-agent: Yandex
Disallow:
#Yandex allow all. Prevent all robots from indexing files with .doc and .pdf extensions.: User-Agent: *
Disallow:/*.doc$
Disallow:/*.pdf$
Examples
User-agent: RoverdogDisallow: email.htm Allows all robots to index everything:
User-agent: *
Disallow: Disallow all robots everything:
User-agent: *
Disallow:/ It disallows all robots to index the email.htm file, all files in the folder "cgi-bin" and the folder of the 2nd level "images":
User-agent: *
Disallow: email.htm
Disallow:/cgi-bin/
Disallow:/images/ It disallows Roverdog from indexing all server files:
User-agent: Roverdog
Disallow:/ One moe example:
User-agent: *
Disallow:/cgi-bin/moshkow
Disallow:/cgi-bin/html-KOI/AQUARIUM/songs
Disallow:/cgi-bin/html-KOI/AQUARIUM/history
Disallow:/cgi-bin/html-windows/AQUARIUM/songs
Disallow:/cgi-bin/html-windows/AQUARIUM/history
META tag ROBOTS
META robots tag is used to enable or disallow robots coming to the site to index this page. In addition, this tag is designed to offer robots a walk through all the pages of the site and index them. Now this tag is becoming more important. <HTML><HEAD>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="DESCRIPTION" CONTENT="This page ….">
<TITLE>...</TITLE>
</HEAD>
<BODY> NOINDEX - forbids document indexing;
NOFOLLOW - Denies passage by the links in the document;
INDEX - allows indexing of the document;
FOLLOW - Allows you to follow links.
ALL - index everything, is equal to INDEX, FOLLOW
NONE - do not index anything, is equal to NOINDEX, NOFOLLOW Robot meta tag examples: <META NAME=ROBOTS" CONTENT="NOINDEX, FOLLOW">
<META NAME=ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME=ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> Robots.txt Checker - Free check of file functionality robots.txt.