Billion of pages gone in Google?
Started by yannis, May 25 2006 08:20 AM
10 replies to this topic
#1
Posted 25 May 2006 - 08:20 AM
If you do a search in Google with * * it will usually return all its pages in its index. It is normally 25,270,000,000. On some of the Data Centers like this it only shows
20,960,000,000. Can this explain the disappearance of a lot of pages from a number of websites?
I will be interested to hear what Google shows in other areas of the world.
20,960,000,000. Can this explain the disappearance of a lot of pages from a number of websites?
I will be interested to hear what Google shows in other areas of the world.
#2
Posted 25 May 2006 - 08:25 AM
Supposedly, this is the effect of the Big Daddy update. Or they are moving data between datacenters.
Here is what Matt Cutts had to say about the update.
In short, BD seems to be renewing their index and removing sites that don't have natural linkage.
Don't think there's something to worry for white hat SEOs anyway.
Getting quality content to get relevant incoming links and traffic seems to be the only way to stay afloat here.
Here is what Matt Cutts had to say about the update.
In short, BD seems to be renewing their index and removing sites that don't have natural linkage.
Don't think there's something to worry for white hat SEOs anyway.
Getting quality content to get relevant incoming links and traffic seems to be the only way to stay afloat here.
Edited by A.N.Onym, 25 May 2006 - 08:27 AM.
#4
Posted 25 May 2006 - 09:05 AM
Those numbers are so approximate that I wouldn't even bet they're 20-30% correct
. With Googles datacenter setup, these kinds of differences could easily happen - and BigDaddy seems to have a strong effect on many sites. They seem to have fiddled with the parameters a bit and have managed to pull some legitimate sites back in, I wouldn't be surprised if they were turning other spam-related parameters back up on some datacenters. Constantly tweaking 
John
John
#6
Posted 25 May 2006 - 10:31 PM
The * is used more or less like a wildcard. If you do a search for 'Search * Optimization' it will return results such as 'search engine optimization', 'search engine positioning' etc. Try this if you searching only for blogs and it will return a Server Error! (Actually I enjoy seeing a Google Error, so please do not report it!). This immediately points to me that Google treats blogs differently from websites and that it uses a different algorithm for both ranking as well as positioning of blogs!
#7
Posted 25 May 2006 - 10:53 PM
I personally would be iffy about providing that as the evidence of the number of indexed pages. It could just as easily be the number of references, which may include stuff like 301 redirects (they still need an entry or a reference), 404 errors (they need to track URLs NOT to check anymore) and other stuff they may have entries for (like banned pages).
#8
Posted 25 May 2006 - 11:26 PM
Projectphp thanks for the reply. What I was more interested and that is why I started this thread was the large discrepancy in pages shown in the index from different Data Centers, not the actual number. I think that Google propagates their index over the 50 or more Data Centers over a period of months not days. I can be wrong but a 25 % discrepancy between Data Centers is large. Another explanation is technical problems at certain data centers. The 25 billion or so are actually an estimate of pages in the index and searching with * * just helped to get this estimate.
The question remains why are these large discrepancies?
The question remains why are these large discrepancies?
#11
Posted 26 May 2006 - 03:25 AM
Using SEO Chat's multiple datacenter tool, these 3 datacenters produce the lower number:
64.233.167.99 - 20,960,000,000
64.233.167.147 - 20,960,000,000
64.233.167.104 - 20,960,000,000
As for what this means, who knows? Could be as simple as that they're testing an algorithm which doesn't count all of the references projectphp mentions. The SERPs don't appear to be very different - at least not in the top 10 results, so I wouldn't bother myself about it too much.
64.233.167.99 - 20,960,000,000
64.233.167.147 - 20,960,000,000
64.233.167.104 - 20,960,000,000
As for what this means, who knows? Could be as simple as that they're testing an algorithm which doesn't count all of the references projectphp mentions. The SERPs don't appear to be very different - at least not in the top 10 results, so I wouldn't bother myself about it too much.
Reply to this topic

0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users






