Google has it here: http://code.google.c...tats/index.html
(requires a SVG compatible browser, like Firefox 1.5)
Various people have, over the last few years, done studies into the popularity of authoring techniques. For example, looking at what HTML ids and classes are most common, and at how many sites validate (and yes, we know that we're not leading the way in terms of validation).
John Allsopp's study is the most recent one we're aware of, where he looked at class and id attribute values on 1315 sites. Before that, Marko Karppinen did a study in 2002, looking at which of the then 141 W3C members had sites that validated; in 2003 Evan Goer did a study into 119 Alpha Geeks' use of XHTML; and of course in 2004 François Briatte did a study covering trends of Web site design on 10 high-profile blogs. In addition, in the last year, microformats.org contributors have done a lot of research into the use of class and rel attributes, amongst other things, in their pursuit of bite-sized reusable semantics. We are also aware of some studies being done by for the Mozilla project, covering thousands of pages.
We can now add to this data. In December 2005 we did an analysis of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata. The results we found are available below. We hope this is of use!
The really interesting stuff needs to be read between the lines. Why is Google doing this? Could it be the start of real block level content analysis? The data they show is interesting, I can only guess at what they really wanted out of it and what they did get out of it .
If anyone has more detailed information (or a publication?) with information about these statistics, I would be really glad to get a link or two.
Edited by softplus, 25 January 2006 - 07:43 PM.