Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Billion of pages gone in Google?


  • Please log in to reply
10 replies to this topic

#1 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 25 May 2006 - 08:20 AM

If you do a search in Google with * * it will usually return all its pages in its index. It is normally 25,270,000,000. On some of the Data Centers like this it only shows
20,960,000,000. Can this explain the disappearance of a lot of pages from a number of websites?

I will be interested to hear what Google shows in other areas of the world.

#2 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 4003 posts

Posted 25 May 2006 - 08:25 AM

Supposedly, this is the effect of the Big Daddy update. Or they are moving data between datacenters.

Here is what Matt Cutts had to say about the update.

In short, BD seems to be renewing their index and removing sites that don't have natural linkage.
Don't think there's something to worry for white hat SEOs anyway.
Getting quality content to get relevant incoming links and traffic seems to be the only way to stay afloat here.

Afterthought afterthoughtJust checked myself. I see 25.257bil pages in the index. Just the datacenter I suspect, then. Or my datacenter hasn't been updated. Either of the two :)

Edited by A.N.Onym, 25 May 2006 - 08:27 AM.


#3 FP_Guy

FP_Guy

    Mach 1 Member

  • 250 Posts Club
  • 417 posts

Posted 25 May 2006 - 09:02 AM

Hmmm, must be doing something wrong. I tried searching

* *

*.*

and still didn't get any results.

#4 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 25 May 2006 - 09:05 AM

Those numbers are so approximate that I wouldn't even bet they're 20-30% correct :). With Googles datacenter setup, these kinds of differences could easily happen - and BigDaddy seems to have a strong effect on many sites. They seem to have fiddled with the parameters a bit and have managed to pull some legitimate sites back in, I wouldn't be surprised if they were turning other spam-related parameters back up on some datacenters. Constantly tweaking :)

John

#5 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 4003 posts

Posted 25 May 2006 - 06:55 PM

Yeah, I have noticed that Google results numbers are slightly exaggerated (by about 20-30%, too).

Btw, its ** or -site:www.google.com to see the numbers, displaying the index size.

#6 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 25 May 2006 - 10:31 PM

The * is used more or less like a wildcard. If you do a search for 'Search * Optimization' it will return results such as 'search engine optimization', 'search engine positioning' etc. Try this if you searching only for blogs and it will return a Server Error! (Actually I enjoy seeing a Google Error, so please do not report it!). This immediately points to me that Google treats blogs differently from websites and that it uses a different algorithm for both ranking as well as positioning of blogs!

#7 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3935 posts

Posted 25 May 2006 - 10:53 PM

I personally would be iffy about providing that as the evidence of the number of indexed pages. It could just as easily be the number of references, which may include stuff like 301 redirects (they still need an entry or a reference), 404 errors (they need to track URLs NOT to check anymore) and other stuff they may have entries for (like banned pages).

#8 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 25 May 2006 - 11:26 PM

Projectphp thanks for the reply. What I was more interested and that is why I started this thread was the large discrepancy in pages shown in the index from different Data Centers, not the actual number. I think that Google propagates their index over the 50 or more Data Centers over a period of months not days. I can be wrong but a 25 % discrepancy between Data Centers is large. Another explanation is technical problems at certain data centers. The 25 billion or so are actually an estimate of pages in the index and searching with * * just helped to get this estimate.

The question remains why are these large discrepancies?

#9 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3935 posts

Posted 25 May 2006 - 11:36 PM

I don't know what the number means, so why it changes is going to be even harder to guess at, wouldn't you say?

#10 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 25 May 2006 - 11:55 PM

Sure! Google est QUISQUE COMOEDUM! Off to have a nice breakfast and watch the rugby!

#11 Guest_joedolson_*

Guest_joedolson_*
  • Guests

Posted 26 May 2006 - 03:25 AM

Using SEO Chat's multiple datacenter tool, these 3 datacenters produce the lower number:

64.233.167.99 - 20,960,000,000
64.233.167.147 - 20,960,000,000
64.233.167.104 - 20,960,000,000

As for what this means, who knows? Could be as simple as that they're testing an algorithm which doesn't count all of the references projectphp mentions. The SERPs don't appear to be very different - at least not in the top 10 results, so I wouldn't bother myself about it too much.



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users