Andrei Broder Joins Yahoo!

Posted 18 November 2005 - 09:58 AM

The official Yahoo! press release is here:


Andrei Broder, who was the chief search scientist for Altavista from 1999 to 2002, and then one of the technology heads of IBM since, has joined Yahoo! as vice president of emerging search technology.

There are a lot of search patents and white papers with Andrei Broder's name upon them. He has written papers and patents in collaboration with some search engineers presently working with Google and Yahoo!, including Monika Henzinger and Krishna Bharat of Google.

Posted 19 November 2005 - 02:31 PM

thats something gr8 for yahoo to cheer about :)

Posted 19 November 2005 - 04:43 PM

It is indeed.

I think having Andrei Broder at Yahoo! will not only add to their knowledge base of things related to search, but also help attract other folks to Yahoo!.

Chris Sherman and Garry Price had a post at the Search Engine Watch Blog which lists a number of the papers that Andrei Broder has worked upon. See:


Here are a few of the patents that he worked with others upon as an inventor at Altavista:

Connectivity server for locating linkage information between Web pages

This patent describes a way of indexing the links between pages. Here's a snippet from the document:

The invention provides linkage information for a significant portion of the Web. The information can be used by programs that rank Web pages according to their connectivity, for instance, pages with many connections could be considered authoritative pages, or "hubs." The information can be used to build Web visualization and navigation tools. The information can be used in conjunction with search engine results to lead users to portions of the Web that store content which may be of interest. In addition, the invention can be used to optimize the design and implementation of web crawlers based on statistics derived from the in and out degrees of nodes.

Method and apparatus for finding mirrored hosts by analyzing connectivity and IP addresses

The reason for the process described in this patent:

Often search engines index only one copy of a mirrored page. In the process, they may fetch replicas and discard them. If mirroring information were available, a search engine could avoid fetching replicas from known mirrored hosts. The search engine could also distribute fetches of the remaining pages between the mirrors for load balancing, or choose the best mirror in terms of response time

Method for determining the resemining the resemblance of documents

This one looks at parts of a pair of documents to see how similar they are, and could even be used to build clusters of similar documents.

Method for clustering closely resembling data objects

This one builds upon the previous patent mentioned, and shows a way for a search engine to only index one copy or a document that is very similar to being considered for indexing. The concept of shingles, or fingerprints, is repeated in both, and explained more fully, and in more practical terms when it comes to indexing pages, in this one.

There are a number of others in the US Patent and Trademark Office database listing him as inventor or co-inventor. Those are worth taking a look at if you are interested in exploring his work some more.

