Robots txt and Search engine robot files

an article added by: Chris Morgan at 09172008


In: Root » » Search engines optimization » Robots txt and Search engine robot files

French Spanish Portuguese Italian German Japanese Chinese Korean Russian Arabic

Your Robots.txt File

A robots.txt file is the first file that a search engine robot visits on your website. Like a snooty nightclub bouncer with a velvet rope, the robots.txt file decides which robots are welcome and which need to move on to that less-exclusive joint down the street. Robots.txt can admit or reject robots on a sitewide, directory-by-directory, or page-bypage basis.

SEO folks often feel a special affection for the robots.txt file because it provides a rare opportunity to communicate with a search engine robot. However, its capabilities are limited. Robots.txt files exist only to exclude indexing. Just as a bouncer can keep people out but can’t force anyone to come in, the robots.txt file can’t do anything to entice a robot to spend more time or visit more pages on your site. Also, compliance with your robots.txt file is voluntary, not mandatory. The major search engines will generally try to follow your instructions, but other, less-reputable types might not. This is why you should not rely on your robots.txt file to prevent spidering of sensitive, private, or inappropriate materials.

Do You Need a Robots.txt File?

You may not need a robots.txt file. Without one, all robots will have free access to nonpassword- protected pages on your site. To decide if you need a robots.txt file for your website, ask yourself these questions:

• Are there any pages or directories on my site that I do not want listed on the search engines, such as an intranet or internal phone list?

• Are there any specific search engines that I do not want to display my site?

• Do I know of any dynamic pages or programming features that might cause problems for spiders, like getting caught in a loop (infinitely bouncing between two pages)?

• Does my website contain pages with duplicate content?

• Are there directories on the site that contain programming scripts only, not viewable pages?

If the answers to these questions are no, then you do not need a robots.txt file. You’ve got the rest of the day off! If you have any yes answers, you’ll prepare your robots.txt file today.

Create Your Robots.txt File

Robots.txt files are very simple text files. To find a sample, go to yourseoplan.com/ robots.txt and view ours, or go to just about any other site and look for the robots.txt file in the root directory.

The robots.txt file usually looks something like this:

   User-agent: googlebot
   Disallow: /private-files/
   Disallow: /more-private-files/
   User-agent: *
   Disallow: /cgi-scripts/

In this example, Google’s spider (called Googlebot) is excluded from indexing files within the two directories called private-files and more-private-files, and all robots (signified by a wild-card asterisk, *) are excluded from indexing the directory called cgi-scripts.

There are numerous websites that will walk you through building and saving your robots.txt file. A helpful robots.txt builder can be found here: clickability.co.uk/ robotstxt.html. Answers to just about any question you could think of about robots are here: www.robotstxt.org.

If you are feeling any doubt about whether your robots.txt file is written properly, don’t post it. The last thing you want to do is inadvertently shut out the search engines.

Here’s a bonus: Robots.txt can also be used to tell search engines where to find your XML Sitemap.

Robots Meta Tags

A robots meta tag serves a similar purpose as the robots.txt file, but it is placed within individual pages on your site rather than in your root directory. A robots meta tag affects only the page it resides on. Chances are you don’t need to use this type of tag, but here’s a quick overview in case you do.

You might choose to use a robots meta tag rather than a robots.txt file because it’s easier for you to set up the exclusion using your web page template rather than the robots.txt file, or maybe you only want to do a brief, temporary exclusion. Another possible reason is that you do not have access to the root directory on your site.

To exclude the robots from a page using the robots meta tag, simply include the following code in the HTML head of the page:

   <meta name=”robots” content=”noindex, nofollow”>

This will prevent search engine robots from listing the page on which the tag resides.

Robot Exclusion for Google

If you plan to use robots exclusion to control the sharing of Google PageRank among pages on your website (for example, by excluding low-quality pages that you do not want hogging authority), you should know that Google handles the robots.txt and robots meta tag exclusions slightly differently:

• Pages excluded with either type of exclusion are allowed to accumulate PageRank authority.

• A page that is excluded with the robots.txt file may be listed in search results with a URL only, and no description. A page that is excluded with a robots meta tag will not be displayed in search results at all.

• A page that is excluded with the robots.txt file will not be crawled by Googlebot, and it will not pass PageRank to other pages to which it links.

• A page that is excluded with the robots meta tag may be crawled by Googlebot, and Google will follow links on the page.The PageRank that is accumulated by this page will be shared with pages to which it links.

legal disclaimer

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

related articles

1. Social media Web sites include social news Web sites
Social Media Optimization Social media optimization, or SMO, is a form of online marketing that focuses on participating on various social media Web sites to generate traffic, buzz, and links back to your Web site. Social media Web sites include social news Web sites such as Digg, Sphinn, and StumbleUpon; video sharing Web sites such as YouTube and Revver; and social network Web sites, including MySpace, Facebook, and LinkedIn. Various recognized SEO and SMO pundits have referred to SMO as “the new SEO”...

2. Building quality links back to your Web site might be considered the Holy Grail
Build Links If creating large amounts of original, well-written content is considered King for search-engine-optimization purposes, building quality links back to your Web site might be considered the Holy Grail. You must have more than just quality content, because Google and other major searchengine algorithms evaluate the number and quality of Web sites that link to your Web pages as a primary and fundamental component of ranking your Web site over another. Search engines conclude that Web sit...

3. Track Google Analytics links peporting and keywords
Track External Links When constructing your Web site, you are likely to have links that are both internal and external. Internal links refer to the links that send the visitor to other pages within your site, and external links refer to the links that send a visitor to a Web site other than your own. Google Analytics can show you how your visitors navigate your internal links, but what if they leave your site by clicking an external link? By tweaking the way you construct your external links, Google An...

4. Googles current relationship with SEOs and webmasters
Google Basics Simply stated, Google is the standout leader in search today. It has the most traffic and the most new trends, and it's the only search engine with its own entry in the dictionary. Once a search-only entity, Google now offers e-mail, maps, feed readers, calendar, web analytics, and webmaster tools, not to mention a diverse menu of specialty search options, including news, video, image, blog, and local. Google has been an all-out trendsetter in the evolution of the search space. Link popu...

5. Organic ranking factors and paid listings
Organic Ranking Factors You already know that search engines use complicated secret formulas, called ranking algorithms, to determine the order of their results. You even know that some of the most Eternally Important factors are your web page text and your HTML title tags. Now we're going to wrap what you already know into an organic optimization cheat sheet that you can peek at next time someone asks you, “What do search engines care about, anyway?” But first, a disclaimer: There are radically differi...

6. The Challenge of SEO Team Building
You're busy, and SEO isn't your only job, so we're pretty sure you won't be thrilled to hear this: Your SEO campaign will incorporate a wide variety of tasks: writing and editing, web page design, programming, ad copy creation, research, web analytics, and interpersonal communication for link building. If you're doing this all yourself, bravo! You're just the sort of multitasking do-it-yourselfer who thrives in SEO. If your entire company can't ride to lunch on the same motorcycle, we're putting you in charge o...

7. You will need IT to help with edits to website content
IT,Webmasters, and Programmers Whether it's an IT department of 60 or a single programmer hiding out in the server room, your SEO campaign is going to need a lot of help from your company's technical experts. Not only will they be the final implementers of edits to your website, but they hold the keys to many important technical features of the site that can spell SEO success or failure. What if you're a smaller organization and you are the one handling your own technical needs? Count yourself lucky i...

8. SEO graphic designers and IT benefits
How SEO Benefits IT Can you believe it? Your SEO campaign can actually be a positive thing for the IT department. Here are a few examples: Interdepartmental Collaboration Bringing together the efforts of marketers, wordsmiths, artists, and techies is a positive thing. Surprising new relationships, new alliances, and synergies can result. Recognition for IT It's not often that IT tasks can directly result in sales and profits. This is one of those times. Participating in the SEO campaign can b...

9. Identify Your Top Five Competitors
Identify Your Top Five Competitors Today you're going to choose which competitors to review in depth. To keep this week's tasks manageable, we recommend that you limit the number of top competitors you examine to five. This allows you to choose at least one from each of the three categories in the list that follows, and it leaves you with enough bandwidth to really dig in and dissect their strategies. If one of your biggest competitors doesn't have a website, then give them an honorary mention on your list. But f...

10. The number of inbound links and search ranking algorithm
Inbound Links As you learned in article 4, “How the Search Engines Work Right Now,” the number of inbound links (other sites linking to your website, also known as backlinks) is an important part of the search ranking algorithm. Having plenty of inbound links will actually help your site in two important ways: indirectly, by improving your search engine ranking, and directly, by bringing visitors to your site through the link. In short, inbound links are valuable, and that's why Your SEO Plan will include some se...