Jump to content

Leading Community for Usability, Search Engine Marketing,
Social Networking, Site Planning & Web Site Development, Since 1998


Photo

Google's architecture


6 replies to this topic

#1 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 13 March 2006 - 12:05 AM

Recently, one of Google's engineers spoke at EclipseCon 2005, and talked a little about the architecture behind how Google works. Urs Hoelzle gave some details that I think many folks haven't heard before on the computer system that Google uses to provide search results

Peeking Into Google


Here are a couple of snippets:

Google replicates the Web pages it caches by splitting them up into pieces it calls "shards." The shards are small enough that several can fit on one machine. And they're replicated on several machines, so that if one breaks, another can serve up the information. The master index is also split up among several servers, and that set also is replicated several times. The engineers call these "chunk servers."


and

The company also is applying machine learning to its system to give better results. Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cuisine" is a good match even though it contains none of the query words.

To do this, the system tries to cluster concepts into "reasonably coherent" subclusters that seem related. These clusters, some tiny and some huge, are named automatically. Then, when a query comes in, the system produces a probability score for the various clusters. This kind of machine learning has had little success in academic trials, Hoelzle said, because they didn't have enough data. "If you have enough data, you get reasonably good answers out of it."


Nice to get a quick peek under the cover now and then.

#2 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 13 March 2006 - 04:04 AM

Thanks for pointing that out! This was a very interesting read - Google is clearly taking load balancing up a notch. I've known for a while about their 'chunking' practice, but never knew any details.

One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.


That's amazing! I've certainly found that Google is one of the most reliable sites I use.

#3 eKstreme

eKstreme

    Hall of Fame

  • 1000 Post Club
  • 3399 posts

Posted 13 March 2006 - 04:30 AM

That's a great find Bill. Thanks for it :D

One thing that keeps striking me about Google is that they use the cheapest hardware available ($1000 a pop) and glue it all together using very clever software. All this is custom-made which makes it harder for competitors to imitate Google. Sure other SEs can deal with large datasets, but they again have to re-invent the basic tools. In business jargo, the "barriers to entry" are quite high.

On a related question: why don't Yahoo and MSN do such presentations? They are excellent for marketing!

#4 earlpearl

earlpearl

    Light Speed Member

  • Members
  • 753 posts
  • Twitter:localoptimizer

Posted 13 March 2006 - 09:55 AM

Bill: You are an incredible source of information. Thanks.

As to clusters of phrases; I've seen some tools that suggest analogous words...but I suspect google is building that based on its enormous data base. Its interesting to hear that developing that requires enormous data. They are certainly the group to analyse data...and they are hiring enough sharp people to pick through this valuable information.

BTW: I noticed that Matt Cutts wanted to meet you at SES NYC. Did you get together? Your excellent work is spreading across the web.

#5 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 13 March 2006 - 12:08 PM

I did manage to meet up quickly with Matt at the end of one of the sessions that he spoke at, but only for a few minutes. We chatted briefly, and he told me that he liked what I was doing on my blog, which was nice to hear. ;-)

About the clusters of data, this is what I thought was interesting:

This kind of machine learning has had little success in academic trials, Hoelzle said, because they didn't have enough data. "If you have enough data, you get reasonably good answers out of it."


Makes you wonder how much data is enough.

There are some other tidbits of information about Google's architecture on the web, including a number of video seminars. I haven't seen much about the actual architecture that houses the searches for Yahoo! or MSN or Ask. Might be interesting to look around to see if there is anything about those online.

#6 Komodo Tale

Komodo Tale

    Unlurked Energy

  • Members
  • 7 posts

Posted 13 March 2006 - 03:19 PM

The source material for all this talk about shards and clustering is this University of Washington CSE Colloquialism video featuring Google PhD Jeff Dean:

http://norfolk.cs.wa...56K_320x240.wmv

If you watch the entire video you discover that, at the end of the day, this is a employee recruitment session. Still, lots of good stuff.

There is a lengthy demonstration of the clustering technology that begins at 35:30

It is important to note that Dr. Dean differentiates between the way search engines work today and what the goal is. He calls the search tool a Demo and a Model and the tool is prominently marked DEMO

I am conjecturing that subsequent statements by Google employees are based on watching this video as part of their prep work.

:ph34r: The Komodo Tale
Seattle, WA

Edited by Komodo Tale, 13 March 2006 - 03:25 PM.


#7 bwelford

bwelford

    Eyes Like Hawk Moderator

  • Moderators
  • 8894 posts
  • Twitter:http://twitter.com/BWelford
  • Facebook:http://www.facebook.com/bwelford

Posted 13 March 2006 - 03:42 PM

I only had time to watch part of that, Komodo Tale, but it was fascinating. Welcome to the Forums. :wave:

You've just got to tell us a little more about that cute name you have. :(



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users