Reply to this topicStart new topic
> Google analyzes a billion web-pages, and publishes some of the results online

Hall of Famer

Group Icon
Group: Hall Of Fame
Joined: 3-November 05
Posts: 3,461
From: CHeeseland
post Jan 25 2006, 07:42 PM
Wow, I really like how Google is putting some of the statistics online - even without numbers it's interesting enough. Putting a billion documents through a statistics system is something I would love to do, but sadly my downstream won't really let me do that in a reasonable time-frame (like 100 years).

Google has it here: http://code.google.com/webstats/index.html

QUOTE
Various people have, over the last few years, done studies into the popularity of authoring techniques. For example, looking at what HTML ids and classes are most common, and at how many sites validate (and yes, we know that we're not leading the way in terms of validation).

John Allsopp's study is the most recent one we're aware of, where he looked at class and id attribute values on 1315 sites. Before that, Marko Karppinen did a study in 2002, looking at which of the then 141 W3C members had sites that validated; in 2003 Evan Goer did a study into 119 Alpha Geeks' use of XHTML; and of course in 2004 François Briatte did a study covering trends of Web site design on 10 high-profile blogs. In addition, in the last year, microformats.org contributors have done a lot of research into the use of class and rel attributes, amongst other things, in their pursuit of bite-sized reusable semantics. We are also aware of some studies being done by for the Mozilla project, covering thousands of pages.

We can now add to this data. In December 2005 we did an analysis of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata. The results we found are available below. We hope this is of use!
(requires a SVG compatible browser, like Firefox 1.5)

The really interesting stuff needs to be read between the lines. Why is Google doing this? Could it be the start of real block level content analysis? The data they show is interesting, I can only guess at what they really wanted out of it and what they did get out of it biggrin.gif.

If anyone has more detailed information (or a publication?) with information about these statistics, I would be really glad to get a link or two.

Cheers
John

This post has been edited by softplus: Jan 25 2006, 07:43 PM
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 28-April 03
Posts: 1,489
From: UK
post Jan 25 2006, 08:07 PM
Interesting, and nice to see some examples from google stating what they regard as "hostile" (pop unders) and also what they state is a waste of time (like keywords meta tags etc).
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 29-August 02
Posts: 5,751
From: Bristol, UK
post Jan 26 2006, 04:26 PM
It just proves how easy it is for them to spot certain types of dodgy coding and how easy it is to ignore things like keyword stuffed comment tags....

Just trying to look at the data now, should be interesting smile.gif
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 6-March 03
Posts: 7,962
From: Langley, British Columbia, Canada
post Jan 26 2006, 05:03 PM
Some of the stats are very interesting, particularly on the proportion of web pages that use non-standard code. The very last section on Custom codes is particularly intriguing.
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 05:53 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed