![]() ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Mar 13 2006, 12:05 AM |
|
|
Recently, one of Google's engineers spoke at EclipseCon 2005, and talked a little about the architecture behind how Google works. Urs Hoelzle gave some details that I think many folks haven't heard before on the computer system that Google uses to provide search results
Peeking Into Google Here are a couple of snippets: QUOTE Google replicates the Web pages it caches by splitting them up into pieces it calls "shards." The shards are small enough that several can fit on one machine. And they're replicated on several machines, so that if one breaks, another can serve up the information. The master index is also split up among several servers, and that set also is replicated several times. The engineers call these "chunk servers." and QUOTE The company also is applying machine learning to its system to give better results. Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cuisine" is a good match even though it contains none of the query words. To do this, the system tries to cluster concepts into "reasonably coherent" subclusters that seem related. These clusters, some tiny and some huge, are named automatically. Then, when a query comes in, the system produces a probability score for the various clusters. This kind of machine learning has had little success in academic trials, Hoelzle said, because they didn't have enough data. "If you have enough data, you get reasonably good answers out of it." Nice to get a quick peek under the cover now and then. |
||
| Offline | ![]() |
UntestedGroup: Members
Joined: 17-February 06
Posts: 7
|
Mar 13 2006, 03:19 PM |
|
|
The source material for all this talk about shards and clustering is this University of Washington CSE Colloquialism video featuring Google PhD Jeff Dean:
http://norfolk.cs.washington.edu/htbin-pos...56K_320x240.wmv If you watch the entire video you discover that, at the end of the day, this is a employee recruitment session. Still, lots of good stuff. There is a lengthy demonstration of the clustering technology that begins at 35:30 It is important to note that Dr. Dean differentiates between the way search engines work today and what the goal is. He calls the search tool a Demo and a Model and the tool is prominently marked DEMO I am conjecturing that subsequent statements by Google employees are based on watching this video as part of their prep work. Seattle, WA This post has been edited by Komodo Tale: Mar 13 2006, 03:25 PM |
||
| Offline | ![]() |
![]()
|
|
| Lo-Fi Version | Time is now: 9th February 2010 - 07:05 PM |
| Meet our Moderators: | cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |