Jump to content

Leading Community for Usability, Search Engine Marketing,
Social Networking, Site Planning & Web Site Development, Since 1998


Photo

New Google Patent


7 replies to this topic

#1 egain

egain

    Gravity Master Member

  • Members
  • 121 posts

Posted 30 August 2006 - 02:37 PM

One for Bragadacio

http://patft.uspto.g...p;RS=PN/7096214

System and method for supporting editorial opinion in the ranking of search results


Abstract
A server improves the ranking of search results. The server includes a processor and a memory that stores instructions and a group of query themes. The processor receives a search query containing at least one search term, retrieves one or more objects based on the at least one search term and determines whether the search query corresponds to at least one of the group of query themes. The processor then ranks the one or more objects based on whether the search query corresponds to at least one of the group of query themes and provides the ranked one or more objects to a user.


--------------------------------------------------------------------------------

Could make things interesting

Edited by egain, 30 August 2006 - 02:38 PM.


#2 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 30 August 2006 - 03:05 PM

Thanks, egain

I've written a little about this, but I think that it's worth discussing here.

There are some interesting aspects to what is described in the patent:

1. In addition to onpage and offpage factors determining relevancy for a site, this shows a glimpse at how user behavior (as seen in search log files, and perhaps elsewhere) can influence the ranking and rerankings of sites.

2. Note I said sites, rather than pages. There's some language in the patent that talks about how this process looks at hosts, or in the event of a host like geocities which includes many sites, sites on a sub-directory level, rather than ranking factors based upon a page-by-page determination of ranking.

3. It may be the clearest indication from Google that human judgment could play a significant role in how some pages are ranked.

4. The process described in determining topics for queries to fit within is described with a fairly simple example of a heuristic -

topic = sites that offer free software downloads

rule = sites that include the words or phrases "free" and "software downloads,"

But the heuristic could be much more complicated than what is described, especially in light of more recent patent applications and white papers which describe looking at user sessions instead of individual queries, and user behavoir such as clicks in results, time spent on a site, distance down a page someone may have scrolled, mouse pointer movements on a result page, etc.

5. Determination of favored sites could be done by human editors, by inclusion in something like DMOZ or the Yahoo directory, or through some more advanced automated process. I'm guessing that directory inclusion may not be a large part of the process.

I tried to put some parts of the patent in plainer language than exists in the document here:

Google looks at Query Themes and Reranking Based upon Editorial Opinion

Is this something that you think Google might be using?

I'm sort of leaning towards thinking that they've possibly tried this already, and may have moved on, or increased its complexity many times. The basics of it, incorporating user behaviors and the study of search logs into ranking, and find metrics that favor some sites satisfying queries over others, could be quite helpful.

#3 egain

egain

    Gravity Master Member

  • Members
  • 121 posts

Posted 31 August 2006 - 03:50 AM

Personally think there may be aspects that they have "tinkered" with. Certain things they seem to already be moving towards in some capacity or have already implemented.

To what degree though aspects of this patent application have been implemented remains to be seen. I can't help thinking there is more to come to do with this.

#4 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 31 August 2006 - 06:07 AM

Have you looked into tracking the work by person (as mentioned on the patents, for whatever that's worth)? It would be interesting to see how these ideas "grow". Marissa Mayer has done a lot with UI and AI, so this combination doesn't sound very far fetched to me.

What is the general consensus on these patents -- are the filed / issued after they are put into use or do they file them ahead of time "just to be sure", even if they never decide to use them?

I've been looking a lot at the AOL data lately, and when tracking search queries by the users it is often easy (as a human) to see how they are either refining their search or going through a lot of sites in the search results (not finding the "right" site). When you try the queries yourself it is easy to imagine that a bit of background knowledge could help the search engine to retrieve better results.

I have a strong feeling that these types of statistics are often used to test search quality -- and with this data, Google can easily compare datacenters with slightly tuned algorithms. Determine the original query and count the number of result pages used (=low quality results), the number of sites clicked through (=sign of low quality sites or irrelevant for the user/user's query) and the number of query refinements (=user recognized that the query was in need up adjustment; something the search engine could suggest in the future). That could give some neat metrics to play with (I want to try it with the AOL data, once I find the time...)

Imagine that the engine could in reverse use your existing queries to adjust its results. It could possibly extract your desired query theme (or even just the desired "color" - eg: technical or opinional? modern vs classical (new or old pages)? etc); it could extract your preferred sources (based on the final clicks in the results), it could extract your preferred or most related editorial opinion (eg: tends to value sites from the ODP, tends to value sites as recommended by cre8asite, tends to value sites similarly to Matt Cutts, prefers sites from our spam-black-list). Using information from those known trends from your search history, Google (or any search engine) could "easily" (ha!) give you personalized results based on personal ranking factors. (bye bye rankings checking :-), again).

Looking at the AOL data, they must have terrabytes of user interaction data already, not to mention the information they have from search history enabled visitors. I imagine they can take algorithm updates based on known search history and even test them on a corpus of statistics like that - to see if the "final click" can be moved closer to the initial query.

What I find interesting in all of this is that they are "still" using simple algorithms to do all of this. I always imagined that AI would be able to help solve a problem like that: lots of data, known valuation functions: solve for optimal "query order" (simplified). It's easy to teach a small neural-network how to play tic-tac-toe (based on known data), other forms of AI can learn to recognize letters in obscure binary arrays (bitmaps), it must be possible to extract "purpose" and "meaning" from a query by a "known user" :huh:. (just dreaming, I guess)

John

#5 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 31 August 2006 - 08:12 AM

Have you looked into tracking the work by person (as mentioned on the patents, for whatever that's worth)? It would be interesting to see how these ideas "grow". Marissa Mayer has done a lot with UI and AI, so this combination doesn't sound very far fetched to me.


As in seeing what other things that Krishna Bharat, Benedict Gomes, Georges R. Harik, and Marissa Mayer have worked on? It's a good approach, and worth doing. Most of what I've seen in terms of work that they've done with Google has been interesting. Here are Google patent filings I've seen that they've been involved with:

Krishna Bharat:
  • 20050222977 Query rewriting with entity detection
  • 20050165743 Systems and methods for personalizing aggregated news content
  • 20050149576 Systems and methods for direct navigation to specific portion of target document
  • 20050131762 Generating user information for use in targeted advertising
  • 20050114299 Method and apparatus for query-specific bookmarking and data collection
  • 20050060312 Systems and methods for improving the ranking of news articles
  • 20040267723 Rendering advertisements with documents having one or more topics using user topic interest information
  • 20020123988 Methods and apparatus for employing usage statistics in document retrieval
  • 6,725,259 Ranking search results by reranking the results based on local inter-connectivity
Benedict Gomes:
  • 20050149576 Systems and methods for direct navigation to specific portion of target document
  • 20050027691 System and method for providing a user interface with search query broadening
  • 20020123988 Methods and apparatus for employing usage statistics in document retrieval
  • 6,941,293 Methods and apparatus for determining equivalent descriptions for an information need
  • 6,615,209 Detecting query-specific duplicate documents
Georges R. Harik:
  • 20060080238 Micro-payment system architecture
  • 20060059044 Method and system to provide advertisements based on wireless access points
  • 20060059043 Method and system to provide wireless access at a reduced rate
  • 20050228797 Suggesting and/or providing targeting criteria for advertisements
  • 20050114198 Using concepts for ad targeting
  • 20050065806 Generating information for online advertisements from Internet data and traditional media data
  • 20040267725 Serving advertisements using a search of advertiser Web information
  • 20040167928 Serving content-relevant advertisements with client-side device support
  • 20040093327 Serving advertisements based on content
  • 20040068697 Method and apparatus for characterizing documents based on clusters of related words
  • 20040059712 Serving advertisements using information associated with e-mail
  • 20040059708 Methods and apparatus for serving relevant advertisements
  • 20020123988 Methods and apparatus for employing usage statistics in document retrieval
  • 6,941,293 Methods and apparatus for determining equivalent descriptions for an information need
  • 6,754,873 Techniques for finding related hyperlinked documents using link-based analysis
Marissa Mayer:
  • 20050222977 Query rewriting with entity detection
  • 20050165744 Interface for a universal search
  • 20050165743 Systems and methods for personalizing aggregated news content


I would have linked to these, but we seem to have an issue at times with links to the USTPO database.

What is the general consensus on these patents -- are the filed / issued after they are put into use or do they file them ahead of time "just to be sure", even if they never decide to use them?


It's difficult to tell. Some do appear to be put into use. Others make you wonder. For instance, Anna Patterson wrote in the Google blog last year that the Google Index size had been expanded tremendously. She wrote a beta search engine a couple of years ago that the Internet Archive was using for a while, and at least one news story mentioned that she was offering licensing for it. Was it sold to Google?

Since 2003, Google has purchased at least two projects hatched at Stanford--personalization search tool Kaltix and a project from Anna Patterson, a Stanford computer science research associate.


None of the following patent applications from her note that they are assigned to Google, though at least one of them appears in the USPTO assignment database as having been assigned to the company:
  • 20060106792 Multiple index based information retrieval system
  • 20060031195 Phrase-based searching in an information retrieval system
  • 20060020607 Phrase-based indexing in an information retrieval system
  • 20060020571 Phrase-based generation of document descriptions
  • 20060018551 Phrase identification in an information retrieval system
There are many things described in those that don't appear to have been incorporated into Google, but it's a possibility that the size increase she mentions, the use of a supplemental index, and some of the other features hinted at in descriptions of her beta search engine may be described in those patent filings.

I've been looking a lot at the AOL data lately, and when tracking search queries by the users it is often easy (as a human) to see how they are either refining their search or going through a lot of sites in the search results (not finding the "right" site). When you try the queries yourself it is easy to imagine that a bit of background knowledge could help the search engine to retrieve better results.



I wrote about a new AOL patent application that was recently released here: What Do You Do With a Database of AOL User Queries?. It describes one use for looking at queries, and classifying them - to allow the search engine only to look at smaller and more specialized databases such as a local search database, or a definition database, or a news database, or an advertising database, or more than one. I believe that there are more coming from AOL on how to use query information in other ways, which haven't been published yet.

You raise a lot of good points regarding the potential uses of query information and user behavior, and there are some interesting possibilities. We've seen hints and mentions of those types of uses in white papers and patent filings, as well as the possible use of AI. A lot of what I've seen written in blogs and the news about the AOL query data focuses upon what those queries can tell us about the people behind them, instead of how it could be used by a search engine to improve search results. But there has been research and papers and patent applications filed which look at things like spelling correction, refining queries based upon deletions and additions to searches in user sessions, and others.

#6 marianne

marianne

    Ready To Fly Member

  • Members
  • 20 posts

Posted 31 August 2006 - 11:39 AM

Fascinating thread and many thanks for starting it. I've requested the patent and will take a look at it later today. I like the fact that Google is moving towards more "human mediation" in the presentation of search results. Your comment on AI in the last posting speaks to a presentation idea that I am germinating. :(

I think that we're moving towards Web 3.0 and that it will be AI. Web 1.0 was all about publishing , anytime, anywhere, anyone. Web 2.0 is about platform, the Web as an operating system. Web 3.0 is about sentience, machine learning that results in a base level "understanding" of context as well as content. Patents such as these and the IBM patent filed in July that identifies narrative from code or the Google patent that uses visual segmentation to segment documents are intriguing steps in this direction. Latent Semantic INdexing [LSI] was the smokescreen. I'm thinking the new batch of patents might be closer to the real thing.

#7 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 31 August 2006 - 01:02 PM

Hi Marianne,

Welcome to the forums. I'd like to hear that presentation. :(

Patents such as these and the IBM patent filed in July that identifies narrative from code or the Google patent that uses visual segmentation to segment documents are intriguing steps in this direction. Latent Semantic INdexing [LSI] was the smokescreen. I'm thinking the new batch of patents might be closer to the real thing.


IBM's Detecting content-rich text and Google's Document segmentation based on visual gaps did show some interesting ideas on how a search engine might look at content on pages. There are a lot of fun ideas circulating in patent filings. IBM came out with two last November, which compared chatter amongst blogs and other sites to infectious diseases, and looked at capturing the burstiness of that information:

System, method, and service for inducing a pattern of communication among various parties (20050256949)

A communication pattern inducing system focuses on the propagation of topics amongst a plurality of nodes based on the text of the node rather than hyperlinks of the node. A node could represent a weblog or any other source of information such as person, a conversation, images, etc. The system utilizes a model for information diffusion, wherein the parameters of the model capture how a new topic spreads from node to node. The system further comprises a process to learn the parameters of the model based on real data and to apply the process to real (or synthetic) node data. Consequently, the system is able to identify particular individuals that are highly effective at contributing to the spread of topics.


System, method, and service for segmenting a topic into chatter and subtopics (20050256905)

A topic segmenting system segments a topic into chatter and subtopics. The system decomposes a conversation into topics, producing a time-based structure for topics and subtopics in the conversation. The system extracts a large number of topics at all levels of granularity. Some of the topics extracted correspond to broad topics and some correspond to "spiky" topics or subtopics. The system comprises a process for automatically detecting spiky regions of a topic. For each possible broad topic, the present system finds regions where coverage of the broad topic overlaps significantly with the spiky region of another topic. The system then removes the spiky subtopic from the conversation. Processing is repeated until all discernable topics have been identified and removed from the conversation, yielding random topics of little duration or intensity.



I think that's something that we will see more of, since queries and topics related to them change over time, and sometimes that rate of change can be significant by itself - something that the new Google patent doesn't address.

In my lists of patent filings above, there is one that shows up under three of those inventors' names which looks like a cousin to the patent on editorial opinion, focusing upon user behavior instead of queries used:

20020123988 Methods and apparatus for employing usage statistics in document retrieval

#8 marianne

marianne

    Ready To Fly Member

  • Members
  • 20 posts

Posted 04 September 2006 - 05:50 PM

Hello All,

I'm thinking that I'm going to have to do it then. :) I cannot seem to shake this theory. I find it fascinating that we are coming full circle in the evolution of search technology back to human mediation as a factor in relevance. Yahoo! started it all with the human constructed directory. Faceted classification was all the rage until Google comes along with its social network relevance model in PageRank. Once hardware, software, and expertise demonstrated the ability to "game" that technology, latent semantic indexing come to the forefront and search engines are "trained" to make semantic connections. The patents that Bragadocchio mentions [likely a strong part of the latest Google, Yahoo! updates] see a shift back to human editorial judgement as critical to "understanding" the merit of the content.

I'm wondering if they will next have to develop a form of "inverse word frequency" to compensate for high quality [by the other factors] content that does not receive a lot of reviews due to lack of traffic or other outside factors?

marianne



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users