I have a development server I use to let users play around or load a site up with content before it goes live. On this server I have virtual hosts for each site. None of the information on these development sites are really secret, but I don't want these sites to end up on Google search results. I also don't want to add the extra complexity of having a username and password protecting the sites or restricting anything to a specific IP range.
I figure a good solution is to have a robots.txt file default for every site on the server. I've seen some sites be completely wiped off the face of the Internet because someone didn't update the robots.txt file for production. Then there is the challenge of maintaining a separate file in development and production environments. The solution I came up with is a simple Apache alias in the global httpd.conf:
Alias /robots.txt /var/www/robots.txt
The contents of my
/var/www/robots.txt is simply:
User-agent: * Disallow: /
If you didn't want to do this for the entire server, you can specify the alias in individual virtual hosts.
This seems to suppress any robots.txt file in a virtual host's filesystem and instead serves this global robots.txt. Now I don't have worry about search engines crawling my development server.