![]() ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Feb 13 2006, 01:47 AM |
|
|
Interesting stuff, Travis.
Were the types of forms being filled in search forms? Are they trying to get the type of deep web content from your site like described in this paper: Downloading Hidden Web Content Regardless, that would be a considerable change of behavior from Googlebot, which normally restricts itself to following HREF links and SRC links. |
||
| Offline | ![]() |
Star Member![]() ![]() Group: 1000 Post Club
Joined: 9-January 05
Posts: 1,532
From: Perth, Western Australia
|
Feb 13 2006, 04:01 AM |
|
|
Cheers Bill,
The GoogleBot has entered a new customer on this page : http://hp.empr.com.au/accountSignup.asp?m=1 This is our largest client with over 100,000 HP parts on sale and a whiz bang internal search engine. The code detects the type of user-agent, and if its a robot, makes a recording of the information, and limits the robots' usage of the site to only access areas where cookies are not required. The size of the site would be attractive to Google, and we have been fighting the search engines to keep from flogging our SQL Server to death, and only trawl content that we consider as an acceptable volume of usage in proportion to our customers. We dont set cookies for search engine robots, for obvious reasons, so they have a different experience to agents who do have cookies. CODE if inStr(1,uCase(request.serverVariables("HTTP_COOKIE")),"ASPSESSIONID",VbBinaryCompare) = 0 then s = lCase(request.serverVariables("HTTP_USER_AGENT")) if inStr(s, "ask jeeves")=0 AND inStr(s, "inktomi")=0 AND inStr(s, "netcraft")=0 AND inStr(s, "wisenutbot")=0 AND inStr(s, "webwombat")=0 AND inStr(s, "googlebot")=0 AND inStr(s, "slysearch")=0 AND s <> "mozilla/3.0 (compatible)") then "Get the navigation ready with the cookies and the https" else "Dont Set Cookies because its a search engine. Just let it through the basic parts of the site. Record anything and everything about it and store it in a separate table." end if This means that the search engine robots wont be able to access the deeper levels of the database where the majority of parts are. They just run the pages down to Level 3 & 4 and can go no further. Thats as far as the sitemap will let them go. But in the process, we do make an extensive recording of their activities, and where they go. And this led us to identify the user-agent who filled in the form. As far as the form is concerned, it looks like javascript was disabled in the process, because not all the fields were filled in as required, just a selection with really generic style information, not something a human would enter. It could be a spoof, so we will wait and see if that account is accesssed, or if we get any more forms filled in by these robots. Its a bit like watching someone's bank records to determine where they are. It all happens after the fact. This post has been edited by travis: Feb 13 2006, 06:47 PM |
||
| Offline | ![]() |
Star Member![]() ![]() Group: 1000 Post Club
Joined: 9-January 05
Posts: 1,532
From: Perth, Western Australia
|
Feb 13 2006, 05:50 AM |
|
|
What is the IP Address range of Google robots ?
|
||
| Offline | ![]() |
Membership Admin & Moderator![]() ![]() Group: Membership Admin & Moderator
Joined: 30-September 05
Posts: 3,265
From: Some round-ish rock floating in a vacuum.
|
Feb 13 2006, 06:14 AM |
|
|
QUOTE(travis @ Feb 13 2006, 10:50 AM) Don't know, but if you type in the address into a reverse DNS lookup, you can read the address it is registered to. If it is really Google, the output of the IP address would look like this: QUOTE OrgName: Google Inc. OrgID: GOGL Address: 1600 Amphitheatre Parkway City: Mountain View StateProv: CA PostalCode: 94043 Country: US Please tell us the result! |
||
| Offline | ![]() |
Star Member![]() ![]() Group: 1000 Post Club
Joined: 9-January 05
Posts: 1,532
From: Perth, Western Australia
|
Feb 13 2006, 06:47 PM |
|
|
Ekstreme,
This is the form, http://hp.empr.com.au/accountSignup.asp?m=1 The difference between an employee and a robot is sorted out using that code. But more importantly, the generic nature of the data submission was the big giveaway. It also disabled javascript as not all of the required fields were filled in. We have never seen anything like it. If anyone sees that in their e-commerce signup pages, let me know. If the Googlebot actually logs in, it will be a first for any of our websites. The implications are quite substantial. Will Google actually log in ? If it does log in as a customer, and finds different content, which in this case it would, what will it report in the SERP's for that page ? Or would it report a different page. Are there sites where people offer a free login signup, but dont want that content trawled or displayed by Google ? This post has been edited by travis: Feb 14 2006, 04:54 AM |
||
| Offline | ![]() |
![]()
|
|
| Lo-Fi Version | Time is now: 9th February 2010 - 10:38 AM |
| Meet our Moderators: | cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |