|
robots.txtrobots.txt Specifies to search robots what catalogs to take for indexing should not be. If it is empty or does not exist, then everything can be taken.
List of search engine robots
Generate robots.txt file The search engines always look for a file called "robots.txt" in the root directory of your domain (http://www.mydomain.com/robots.txt) . This file tells the robots (spiders-indexers) what files they can index and which ones they do not. robots.txt consists of two fields:
RulesEditorsRobots.txt should be created in text format. As an editor you can use notepad, FTP client, some HTML-editors. Title Robots.txt, not robot.txt or Robots.txt, otherwise it will not work. Location The robots.txt file should be located in the root directory. Spaces Spaces do not matter. Comments Comments - start with a new line with #. A space after # is optional. Order 1st line User-agent, which defines the robot, And bynext Disallow specifies a file or folder that is not indexed. If the ban refers to a number of robots, then they are written one by one separately, and then a ban or list of prohibitions, for example: User-agent: StackRambler User-agent: Aport Disallow:/eng Disallow:/news #Rambler and Aport to disallow the indexing of links, #which begin with/news and/eng The same and for Disallow - every ban with a new line. If for different robots different prohibitions, then they are separated by an empty string, for example: User-agent: * Disallow:/news # disallow for all the indexing of links, #which begin with/news User-agent: StackRambler User-agent: Aport Disallow:/eng Disallow:/news #Rambler and Aport to disallow the indexing of links, #which begin with/news and/eng User-agent: Yandex Disallow: #Yandex allow all. Prevent all robots from indexing files with .doc and .pdf extensions.: User-Agent: * Disallow:/*.doc$ Disallow:/*.pdf$ ExamplesUser-agent: RoverdogDisallow: email.htm Allows all robots to index everything: User-agent: * Disallow: Disallow all robots everything: User-agent: * Disallow:/ It disallows all robots to index the email.htm file, all files in the folder "cgi-bin" and the folder of the 2nd level "images": User-agent: * Disallow: email.htm Disallow:/cgi-bin/ Disallow:/images/ It disallows Roverdog from indexing all server files: User-agent: Roverdog Disallow:/ One moe example: User-agent: * Disallow:/cgi-bin/moshkow Disallow:/cgi-bin/html-KOI/AQUARIUM/songs Disallow:/cgi-bin/html-KOI/AQUARIUM/history Disallow:/cgi-bin/html-windows/AQUARIUM/songs Disallow:/cgi-bin/html-windows/AQUARIUM/history META tag ROBOTSMETA robots tag is used to enable or disallow robots coming to the site to index this page. In addition, this tag is designed to offer robots a walk through all the pages of the site and index them. Now this tag is becoming more important. <HTML><HEAD> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> <META NAME="DESCRIPTION" CONTENT="This page …."> <TITLE>...</TITLE> </HEAD> <BODY> NOINDEX - forbids document indexing; NOFOLLOW - Denies passage by the links in the document; INDEX - allows indexing of the document; FOLLOW - Allows you to follow links. ALL - index everything, is equal to INDEX, FOLLOW NONE - do not index anything, is equal to NOINDEX, NOFOLLOW Robot meta tag examples: <META NAME=ROBOTS" CONTENT="NOINDEX, FOLLOW"> <META NAME=ROBOTS" CONTENT="INDEX, NOFOLLOW"> <META NAME=ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> Robots.txt Checker - Free check of file functionality robots.txt. |
For a web master
Textbooks, reference books
.htaccess
CHMOD
ERROR - table return codes
404
META tags
CSS
MySQL cribs
Protection from auto-fill forms
Redirect
Validity
Soft
Web soft
Best web-based utility
Online WYSIWYG
WAP software
Favicon
Favicon
Editors icons
Icon Generator online
Generator icons for online smartphone
RSS
RSS
Examples of RSS
Example Atom-document
|