One Robots.txt Shared By Multiple Domains - Needs To Be Different For Each
Started by Mike521, Aug 13 2007 04:00 PM
6 replies to this topic
#1
Posted 13 August 2007 - 04:00 PM
Hi all, trying to see if anyone at cre8asite has any thoughts on this problem I have:
My company has several sites that are all very similar, sharing the same set of local files on our machine -- including robots.txt. Up until recently, the robots.txt files could all be the same, but now they have to be different for each domain. We're running on an IIS server that we have full control of.
Does anyone know of a way to do this?
or to make it more simple, we have one main domain that we want the engines to have full access to, but all other domains need to be blocked. But each domain shares the same set of files on our server, so how do we accomplish the goal?
*note* I thought of one way that our developers rejected -- we can put .txt files through the ASP processor. Then we can just make it a dynamic file and script it however we need to. They felt it was too dangerous because it would apply to all .txt files across the domain, and general paranoia set in
Thanks all
My company has several sites that are all very similar, sharing the same set of local files on our machine -- including robots.txt. Up until recently, the robots.txt files could all be the same, but now they have to be different for each domain. We're running on an IIS server that we have full control of.
Does anyone know of a way to do this?
or to make it more simple, we have one main domain that we want the engines to have full access to, but all other domains need to be blocked. But each domain shares the same set of files on our server, so how do we accomplish the goal?
*note* I thought of one way that our developers rejected -- we can put .txt files through the ASP processor. Then we can just make it a dynamic file and script it however we need to. They felt it was too dangerous because it would apply to all .txt files across the domain, and general paranoia set in
Thanks all
#2
Posted 13 August 2007 - 10:02 PM
A robots.txt file has to be located in the root domain - i.e. domain.com/robots.txt
There are no exceptions to this requirement and never have been. The reason that robots meta tags were created was because not all persons had access to the root domain, such as those with shared hosting under subdomains or sub-directories.
There is no way to put just one robots text file on a server and have it work for lots of different domains unless those domains all point to the precise same file location as the root directory for the domains.
So, the answer is, place a robots.txt file for each domain in the directory on the server that is the root for that site - i.e. where the default homepage goes. For this, they cannot share the same server directory. Period.
Otherwise, you need to use ASP or any other scripting language to dynamically insert the robots meta tag into pages when served from any other domain but the primary. That will prevent indexing just as thoroughly, though won't so successfully prevent the waste bandwidth of pages being called.
Its far simpler to just mirror the main site directory to a second location and point all the non-primary domains at that duplicate directory. That will allow you to have one robots.txt for the main site that allows all spidering activity, and a second file for all the sites you want blocked with a blocking robots.txt file.
A dynamicly generated robots.txt file is certainly possible, and indeed is using the basics of url-rewriting in the simplest instance. Have a url-rewrite rule that serves on robots.txt (which can now be in any location at all) when one specific URL is given (main domain) and another when any other domain of robots.txt is called for. And yes, this way you could have as many robots.txt files as you wanted, still static files, just different rewrite rules on the server making the server give the appropriate one to each call.
There are no exceptions to this requirement and never have been. The reason that robots meta tags were created was because not all persons had access to the root domain, such as those with shared hosting under subdomains or sub-directories.
There is no way to put just one robots text file on a server and have it work for lots of different domains unless those domains all point to the precise same file location as the root directory for the domains.
So, the answer is, place a robots.txt file for each domain in the directory on the server that is the root for that site - i.e. where the default homepage goes. For this, they cannot share the same server directory. Period.
Otherwise, you need to use ASP or any other scripting language to dynamically insert the robots meta tag into pages when served from any other domain but the primary. That will prevent indexing just as thoroughly, though won't so successfully prevent the waste bandwidth of pages being called.
Its far simpler to just mirror the main site directory to a second location and point all the non-primary domains at that duplicate directory. That will allow you to have one robots.txt for the main site that allows all spidering activity, and a second file for all the sites you want blocked with a blocking robots.txt file.
A dynamicly generated robots.txt file is certainly possible, and indeed is using the basics of url-rewriting in the simplest instance. Have a url-rewrite rule that serves on robots.txt (which can now be in any location at all) when one specific URL is given (main domain) and another when any other domain of robots.txt is called for. And yes, this way you could have as many robots.txt files as you wanted, still static files, just different rewrite rules on the server making the server give the appropriate one to each call.
#3
Posted 14 August 2007 - 09:12 AM
thanks Black knight! quick question before I make your suggestion to our developers, I'm sure this is what they'll ask me:
if we mirror the directory, how do we mirror everything except the robots.txt file?
I'm assuming there's some sort of automated way to mirror it, but we'll have to set up a system that ignores that one file
if we mirror the directory, how do we mirror everything except the robots.txt file?
I'm assuming there's some sort of automated way to mirror it, but we'll have to set up a system that ignores that one file
#4
Posted 14 August 2007 - 10:48 AM
Um that was the one suggestion I was going to make.we can put .txt files through the ASP processor. Then we can just make it a dynamic file and script it however we need to.
You could maybe generate the complete robots.txt file dynamically, although I don't know how to do that on a Windows server....
Edited by Wit, 14 August 2007 - 10:51 AM.
#5
Posted 14 August 2007 - 10:51 AM
You could use an automated method, like some script to detect any changes in the directory for the main site and then copy those/that file(s) across. If that was the route chosen, then simply build in a check in the script that if the file is the robots.txt to ignore it.
If instant reflection of changes were not so important, then a simple script to copy all but the robots.txt file from the main directory to the copy on a regular schedule (like once per day) might be simpler.
To be honest, I'd simply tell the developers what needs to be done, and let them figure out their own preferred method of achieving it, as they'll know the limitations of their system better than I can.
If instant reflection of changes were not so important, then a simple script to copy all but the robots.txt file from the main directory to the copy on a regular schedule (like once per day) might be simpler.
To be honest, I'd simply tell the developers what needs to be done, and let them figure out their own preferred method of achieving it, as they'll know the limitations of their system better than I can.
#7
Posted 29 August 2007 - 10:49 PM
I came across this same scenario and the cleanest method I found was to use isapi_rewrite (http://www.isapirewrite.com/) and write some domain specific rules, so a file robots1.txt will appear as www.domain1.com/robots.txt, robots-b.txt will appear as www.domain2.com/robots.txt etc
Reply to this topic

0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users






