Have you looked into tracking the work by person (as mentioned on the patents, for whatever that's worth)? It would be interesting to see how these ideas "grow". Marissa Mayer has done a lot with UI and AI, so this combination doesn't sound very far fetched to me.
As in seeing what other things that Krishna Bharat, Benedict Gomes, Georges R. Harik, and Marissa Mayer have worked on? It's a good approach, and worth doing. Most of what I've seen in terms of work that they've done with Google has been interesting. Here are Google patent filings I've seen that they've been involved with:
Krishna Bharat:
- 20050222977 Query rewriting with entity detection
- 20050165743 Systems and methods for personalizing aggregated news content
- 20050149576 Systems and methods for direct navigation to specific portion of target document
- 20050131762 Generating user information for use in targeted advertising
- 20050114299 Method and apparatus for query-specific bookmarking and data collection
- 20050060312 Systems and methods for improving the ranking of news articles
- 20040267723 Rendering advertisements with documents having one or more topics using user topic interest information
- 20020123988 Methods and apparatus for employing usage statistics in document retrieval
- 6,725,259 Ranking search results by reranking the results based on local inter-connectivity
Benedict Gomes:
- 20050149576 Systems and methods for direct navigation to specific portion of target document
- 20050027691 System and method for providing a user interface with search query broadening
- 20020123988 Methods and apparatus for employing usage statistics in document retrieval
- 6,941,293 Methods and apparatus for determining equivalent descriptions for an information need
- 6,615,209 Detecting query-specific duplicate documents
Georges R. Harik:
- 20060080238 Micro-payment system architecture
- 20060059044 Method and system to provide advertisements based on wireless access points
- 20060059043 Method and system to provide wireless access at a reduced rate
- 20050228797 Suggesting and/or providing targeting criteria for advertisements
- 20050114198 Using concepts for ad targeting
- 20050065806 Generating information for online advertisements from Internet data and traditional media data
- 20040267725 Serving advertisements using a search of advertiser Web information
- 20040167928 Serving content-relevant advertisements with client-side device support
- 20040093327 Serving advertisements based on content
- 20040068697 Method and apparatus for characterizing documents based on clusters of related words
- 20040059712 Serving advertisements using information associated with e-mail
- 20040059708 Methods and apparatus for serving relevant advertisements
- 20020123988 Methods and apparatus for employing usage statistics in document retrieval
- 6,941,293 Methods and apparatus for determining equivalent descriptions for an information need
- 6,754,873 Techniques for finding related hyperlinked documents using link-based analysis
Marissa Mayer:
- 20050222977 Query rewriting with entity detection
- 20050165744 Interface for a universal search
- 20050165743 Systems and methods for personalizing aggregated news content
I would have linked to these, but we seem to have an issue at times with links to the USTPO database.
What is the general consensus on these patents -- are the filed / issued after they are put into use or do they file them ahead of time "just to be sure", even if they never decide to use them?
It's difficult to tell. Some do appear to be put into use. Others make you wonder. For instance, Anna Patterson wrote in the Google blog last year that the
Google Index size had been expanded tremendously. She wrote a beta search engine a couple of years ago that the Internet Archive was using for a while, and at least
one news story mentioned that she was offering licensing for it. Was it
sold to Google?
Since 2003, Google has purchased at least two projects hatched at Stanford--personalization search tool Kaltix and a project from Anna Patterson, a Stanford computer science research associate.
None of the following patent applications from her note that they are assigned to Google, though at least one of them appears in the USPTO assignment database as having been assigned to the company:
- 20060106792 Multiple index based information retrieval system
- 20060031195 Phrase-based searching in an information retrieval system
- 20060020607 Phrase-based indexing in an information retrieval system
- 20060020571 Phrase-based generation of document descriptions
- 20060018551 Phrase identification in an information retrieval system
There are many things described in those that don't appear to have been incorporated into Google, but it's a possibility that the size increase she mentions, the use of a supplemental index, and some of the other features hinted at in descriptions of her beta search engine may be described in those patent filings.
I've been looking a lot at the AOL data lately, and when tracking search queries by the users it is often easy (as a human) to see how they are either refining their search or going through a lot of sites in the search results (not finding the "right" site). When you try the queries yourself it is easy to imagine that a bit of background knowledge could help the search engine to retrieve better results.
I wrote about a new AOL patent application that was recently released here:
What Do You Do With a Database of AOL User Queries?. It describes one use for looking at queries, and classifying them - to allow the search engine only to look at smaller and more specialized databases such as a local search database, or a definition database, or a news database, or an advertising database, or more than one. I believe that there are more coming from AOL on how to use query information in other ways, which haven't been published yet.
You raise a lot of good points regarding the potential uses of query information and user behavior, and there are some interesting possibilities. We've seen hints and mentions of those types of uses in white papers and patent filings, as well as the possible use of AI. A lot of what I've seen written in blogs and the news about the AOL query data focuses upon what those queries can tell us about the people behind them, instead of how it could be used by a search engine to improve search results. But there has been research and papers and patent applications filed which look at things like spelling correction, refining queries based upon deletions and additions to searches in user sessions, and others.