Reply to this topicStart new topic
> Billion of pages gone in Google?

Star Member

Group Icon
Group: 1000 Post Club
Joined: 22-May 06
Posts: 1,632
post May 25 2006, 08:20 AM
If you do a search in Google with * * it will usually return all its pages in its index. It is normally 25,270,000,000. On some of the Data Centers like this it only shows
20,960,000,000. Can this explain the disappearance of a lot of pages from a number of websites?

I will be interested to hear what Google shows in other areas of the world.
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 29-December 05
Posts: 3,291
From: Novosibirsk, Russia
post May 25 2006, 08:25 AM
Supposedly, this is the effect of the Big Daddy update. Or they are moving data between datacenters.

Here is what Matt Cutts had to say about the update.

In short, BD seems to be renewing their index and removing sites that don't have natural linkage.
Don't think there's something to worry for white hat SEOs anyway.
Getting quality content to get relevant incoming links and traffic seems to be the only way to stay afloat here.

Afterthought afterthoughtJust checked myself. I see 25.257bil pages in the index. Just the datacenter I suspect, then. Or my datacenter hasn't been updated. Either of the two smile.gif


This post has been edited by A.N.Onym: May 25 2006, 08:27 AM
Offline Go to the top of the page

Quarter Grand Poster

Group: Members
Joined: 9-June 05
Posts: 365
From: Vulcan, MI
post May 25 2006, 09:02 AM
Hmmm, must be doing something wrong. I tried searching

* *

*.*

and still didn't get any results.
Offline Go to the top of the page

Hall of Famer

Group Icon
Group: Hall Of Fame
Joined: 3-November 05
Posts: 3,461
From: CHeeseland
post May 25 2006, 09:05 AM
Those numbers are so approximate that I wouldn't even bet they're 20-30% correct biggrin.gif. With Googles datacenter setup, these kinds of differences could easily happen - and BigDaddy seems to have a strong effect on many sites. They seem to have fiddled with the parameters a bit and have managed to pull some legitimate sites back in, I wouldn't be surprised if they were turning other spam-related parameters back up on some datacenters. Constantly tweaking smile.gif

John
Online Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 29-December 05
Posts: 3,291
From: Novosibirsk, Russia
post May 25 2006, 06:55 PM
Yeah, I have noticed that Google results numbers are slightly exaggerated (by about 20-30%, too).

Btw, its ** or -site:www.google.com to see the numbers, displaying the index size.
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 22-May 06
Posts: 1,632
post May 25 2006, 10:31 PM
The * is used more or less like a wildcard. If you do a search for 'Search * Optimization' it will return results such as 'search engine optimization', 'search engine positioning' etc. Try this if you searching only for blogs and it will return a Server Error! (Actually I enjoy seeing a Google Error, so please do not report it!). This immediately points to me that Google treats blogs differently from websites and that it uses a different algorithm for both ranking as well as positioning of blogs!
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post May 25 2006, 10:53 PM
I personally would be iffy about providing that as the evidence of the number of indexed pages. It could just as easily be the number of references, which may include stuff like 301 redirects (they still need an entry or a reference), 404 errors (they need to track URLs NOT to check anymore) and other stuff they may have entries for (like banned pages).
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 22-May 06
Posts: 1,632
post May 25 2006, 11:26 PM
Projectphp thanks for the reply. What I was more interested and that is why I started this thread was the large discrepancy in pages shown in the index from different Data Centers, not the actual number. I think that Google propagates their index over the 50 or more Data Centers over a period of months not days. I can be wrong but a 25 % discrepancy between Data Centers is large. Another explanation is technical problems at certain data centers. The 25 billion or so are actually an estimate of pages in the index and searching with * * just helped to get this estimate.

The question remains why are these large discrepancies?
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post May 25 2006, 11:36 PM
I don't know what the number means, so why it changes is going to be even harder to guess at, wouldn't you say?
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 22-May 06
Posts: 1,632
post May 25 2006, 11:55 PM
Sure! Google est QUISQUE COMOEDUM! Off to have a nice breakfast and watch the rugby!
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 8-March 06
Posts: 2,650
From: Minneapolis/Saint Paul, MN
post May 26 2006, 03:25 AM
Using SEO Chat's multiple datacenter tool, these 3 datacenters produce the lower number:

64.233.167.99 - 20,960,000,000
64.233.167.147 - 20,960,000,000
64.233.167.104 - 20,960,000,000

As for what this means, who knows? Could be as simple as that they're testing an algorithm which doesn't count all of the references projectphp mentions. The SERPs don't appear to be very different - at least not in the top 10 results, so I wouldn't bother myself about it too much.
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 05:32 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed