Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

One Robots.txt Shared By Multiple Domains - Needs To Be Different For Each


  • Please log in to reply
6 replies to this topic

#1 Mike521

Mike521

    Mach 1 Member

  • Members
  • 341 posts

Posted 13 August 2007 - 04:00 PM

Hi all, trying to see if anyone at cre8asite has any thoughts on this problem I have:

My company has several sites that are all very similar, sharing the same set of local files on our machine -- including robots.txt. Up until recently, the robots.txt files could all be the same, but now they have to be different for each domain. We're running on an IIS server that we have full control of.

Does anyone know of a way to do this?

or to make it more simple, we have one main domain that we want the engines to have full access to, but all other domains need to be blocked. But each domain shares the same set of files on our server, so how do we accomplish the goal?


*note* I thought of one way that our developers rejected -- we can put .txt files through the ASP processor. Then we can just make it a dynamic file and script it however we need to. They felt it was too dangerous because it would apply to all .txt files across the domain, and general paranoia set in


Thanks all

#2 Black_Knight

Black_Knight

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 9339 posts

Posted 13 August 2007 - 10:02 PM

A robots.txt file has to be located in the root domain - i.e. domain.com/robots.txt

There are no exceptions to this requirement and never have been. The reason that robots meta tags were created was because not all persons had access to the root domain, such as those with shared hosting under subdomains or sub-directories.

There is no way to put just one robots text file on a server and have it work for lots of different domains unless those domains all point to the precise same file location as the root directory for the domains.

So, the answer is, place a robots.txt file for each domain in the directory on the server that is the root for that site - i.e. where the default homepage goes. For this, they cannot share the same server directory. Period.

Otherwise, you need to use ASP or any other scripting language to dynamically insert the robots meta tag into pages when served from any other domain but the primary. That will prevent indexing just as thoroughly, though won't so successfully prevent the waste bandwidth of pages being called.

Its far simpler to just mirror the main site directory to a second location and point all the non-primary domains at that duplicate directory. That will allow you to have one robots.txt for the main site that allows all spidering activity, and a second file for all the sites you want blocked with a blocking robots.txt file.

A dynamicly generated robots.txt file is certainly possible, and indeed is using the basics of url-rewriting in the simplest instance. Have a url-rewrite rule that serves on robots.txt (which can now be in any location at all) when one specific URL is given (main domain) and another when any other domain of robots.txt is called for. And yes, this way you could have as many robots.txt files as you wanted, still static files, just different rewrite rules on the server making the server give the appropriate one to each call.

#3 Mike521

Mike521

    Mach 1 Member

  • Members
  • 341 posts

Posted 14 August 2007 - 09:12 AM

thanks Black knight! quick question before I make your suggestion to our developers, I'm sure this is what they'll ask me:

if we mirror the directory, how do we mirror everything except the robots.txt file?

I'm assuming there's some sort of automated way to mirror it, but we'll have to set up a system that ignores that one file

#4 Wit

Wit

    Sonic Boom Member

  • 1000 Post Club
  • 1599 posts

Posted 14 August 2007 - 10:48 AM

we can put .txt files through the ASP processor. Then we can just make it a dynamic file and script it however we need to.

Um that was the one suggestion I was going to make.

You could maybe generate the complete robots.txt file dynamically, although I don't know how to do that on a Windows server....

Edited by Wit, 14 August 2007 - 10:51 AM.


#5 Black_Knight

Black_Knight

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 9339 posts

Posted 14 August 2007 - 10:51 AM

You could use an automated method, like some script to detect any changes in the directory for the main site and then copy those/that file(s) across. If that was the route chosen, then simply build in a check in the script that if the file is the robots.txt to ignore it.

If instant reflection of changes were not so important, then a simple script to copy all but the robots.txt file from the main directory to the copy on a regular schedule (like once per day) might be simpler.

To be honest, I'd simply tell the developers what needs to be done, and let them figure out their own preferred method of achieving it, as they'll know the limitations of their system better than I can.

#6 Mike521

Mike521

    Mach 1 Member

  • Members
  • 341 posts

Posted 14 August 2007 - 12:47 PM

makes perfect sense -- I thought maybe IIS had a built in mirroring method, but I'm sure the devs can write a script that copies once per day but ignores robots.txt

thanks for the simple solution!

#7 Pittbug

Pittbug

    Ready To Fly Member

  • Members
  • 46 posts

Posted 29 August 2007 - 10:49 PM

I came across this same scenario and the cleanest method I found was to use isapi_rewrite (http://www.isapirewrite.com/) and write some domain specific rules, so a file robots1.txt will appear as www.domain1.com/robots.txt, robots-b.txt will appear as www.domain2.com/robots.txt etc



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users