User-agent: *
Disallwow: /deeplinkratio/?
Notice the typo in "Disallow". The tool works by passing the domain name as a query string, and so this rule should block such requests.
Weeks passed and MSNBot and Googlebot stopped hitting the pages, but Slurp kept at it. I checked the robots.txt file, and thought everything was fine and that Slurp was having a problem figuring out the rule.
Just now, I was mere seconds away from submitting a bug report to the Y! search team. I went to copy/paste the robots.txt file text and discovered the typo (the fact I didn't spot this earlier is another story).
Now this is interesting: Clearly Googlebot and MSNbot did not request the tool's pages, but Slurp continued requesting the pages. Looking through the ~300000 blocked URLs in Google's Webmaster Central, not a single entry was found for the tool.
So, the hypotheses are as follows:
1. Google and MSN spellcheck the robots.txt file and obey it. Slurp doesn't.
2. None of the bots spellcheck, but Slurp found a lot of links to the tool and proceeded to index them. Google and MSN don't know of any such links.
3. Something else is going on.
Anyone else know more about such a situation? Could be an interesting hidden "feature".
Incidentally, I just updated the robots.txt file a few minutes ago, so I will be able to accurately measure how long Y! Slurp takes to start obeying it
Pierre






