What is a robots.txt file?
And do I need one?
A robots.txt file is a simple text file that tells search engine spiders and bots how to – and how not to – crawl and index your website.
You can use a robots.txt file to:
-
prevent site indexing, full stop
-
mask confidential areas
-
give different search engines specific instructions about what to crawl and index
If you don’t want search engines to see new pages under construction, or personal stuff that’s only of interest to you, your robots.txt file tells them where they can and can’t go.
And if you don’t want to appear in a particular search engine’s results you can deny them access.
One of the first things all the big search engines will do is sniff out your robots.txt file. Even if you don’t want to mask bits of your site, the file will invite search engines in. Which is good stuff.
Many people just hide their cgi bin. If – like me – you don’t need to protect any areas of your site your robots.txt can be as simple as this, pasted into a text file:
User-agent: * # match all bots
Disallow: /cgi-bin
Just upload it to the same place on the server as your index page (the root directory) and off you go.
If you want to get creative with your robots.txt file it can get fairly complicated. There are plenty of good tutorials and free robots.txt file generators online. Otherwise grab your nearest website designer or SEO expert.
Bear in mind that a robots.txt file has nothing to do with security. While it can stop bits of your site appearing in search engines, they’re still available.