Jump to content

Leading Community for Usability, Search Engine Marketing,
Social Networking, Site Planning & Web Site Development, Since 1998


Photo

Anticipating Users Queries


5 replies to this topic

#1 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 23 December 2005 - 01:44 AM

If you go to Google Suggest, and start typing in a query, it's interesting what they will try to suggest for you to search for.

A search starting with "p" brings up the following:

Paris Hilton
prom dresses
poems
paypal
pc world
people search
pubmed
pizza hut
putty
periodic table

If I add an "a" to that, so that my search is now "pa" I get the following list:

Paris Hilton
Paypal
Panasonic
Pamela Anderson
passport
pacman
pamela turner
Paris
popa john
parishilton

Adding a "t" to that, so that my search is now "pat" I get the following list:

patriots
Patagonia
Patriot Act
patent
patio furniture
patents
patterns
Patricia Heaton
patent office
Patrick Henry

Why does that search provide those results?

The most popular searches? Topics that are recently in the news? The last things that someone searched for that began with those letters? The most used query in a cache at Google?

The answer is probably a mix of all four.

There's a new patent application that was just released from Google this week:

Anticipated query generation and processing in a search engine

The focus of the patent application is on making a search engine more efficient, making it seem more relevant, and on adding some personalization to search results.

I broke down the patent application at my blog, Can Google Read Your Mind? Processing Predictive Queries, and noted how similar what is described in the patent to what is presently in place at Google Suggest. I describe what the patent application said over there, but didn't go through the steps of testing out Google Suggest, to see if it might behave in ways that correspond to what I see happening at Google Suggest.

One of the ways I wanted to try testing it was to see what was appearing on the front page of Google News, and see if Google Suggest might suggest some terms that seem to be prominently place on the News page.

I noticed that the "Patriot Act" is one of the top stories for me, so I wondered if it would show up in the queries suggested. As you can see above, it did, but not until I got three levels deep.

I tried s-a-n-t-a, expecting the word "santa" to be suggested, since it might be a very popular search this time of year. But, it wasn't until I spelled the whole world out before it was suggested.

A quick run through the alphabet, for the top query suggested for each letter:

a - amazon
b - bbc
c - currency converter
d - dictionary
e - ebay
f - firefox
g - gmail
h - hotmail
i - ikea
j - jokes
k - kelly blue book
l - lyrics
m - mapquest
n - news
o - orbitz
p - paris hilton
q - quotes
r - ryanair
s - spybot
t - target
u - ups
v - valentine's day
w - weather
x - xbox
y - yahoo
z - zipcodes


Most of those seem like they could be the most popular search for words that begin with those letters. The only one that stood out to me, was "ryanair." A search at Google news shows that it has been the subject of more that 800 news articles recently.

This very quick, and very rough search shows Google returning queries which are likely the most popular in terms of people searching for them, and in the instance of ryanair, a word that is very topical right now.

Does the patent application match the way that Google Suggest works? It appears to, at a quick glance. I don't know if the features of Google Suggest will be incorported into Google's personalized search, but I'd guess that it wil be, if it isn't now.

#2 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 23 December 2005 - 03:49 AM

What strikes me as strange is that so many of those items are almost the full domain names of the companies, ie - I would "assume" that people would just type it in the address bar instead of going through Google. What could you possibly learn from Google by searching for "amazon", "ebay", "gmail", "hotmail", "yahoo"?

Or does one of the browsers do a Google "I feel lucky" search when entering just a name like "amazon" into the address bar?

Another thought, why should Google have to "anticipate" popular queries? Wouldn't it make more sense to just cache the results of the top queries once a user asks for them (eg either cache all queries for 10 minutes - every time someone asks or just cache queries that were in the top 1% from yesterday).

Personalisation is another story, and I wouldn't be surprised if in a year or more we would all have our own serps based on our past decisions on what we like (to click on). That will make it really hard on the SEOs, I'm sure, but it is almost the only way to get even better results on a per-person basis. If I like technical articles, then I don't want Google to serve me a rant about politics in a technical company :-). Things will get interesting though, as soon as I want a little lunch-time politics-reading, instead of those technical articles...

John

#3 travis

travis

    Sonic Boom Member

  • 1000 Post Club
  • 1532 posts

Posted 23 December 2005 - 04:52 AM

Wow,

Thanks Bill,

That, if accurate, answers a lot of questions we were having with plural vs singular versions of words.

#4 bwelford

bwelford

    Eyes Like Hawk Moderator

  • Moderators
  • 8894 posts
  • Twitter:http://twitter.com/BWelford
  • Facebook:http://www.facebook.com/bwelford

Posted 23 December 2005 - 08:33 AM

This is a superb thread you've started, Bill. I think this is one of the most intriguing patents you've researched and probably has a number of important implications.

I was intrigued by your comment, travis, since I agree this aspect is important but am not sure how to interpret the results. One of the important changes at the time of the Florida update was that Google switched from a blanket statement that it did not parse words to a more fuzzy statement about semantic analysis and meaning. A test search I used to run at the time was one for 'horse'. This was a word where the plural and the singular almost seemed to be different words without any very close association.

If I do a Google Suggest search starting with horse, it shows me the following at the top of the list:
horses 20,000,000 results
horse 55,100,000 results
As you then type in after 'horse' either a space or an s, clearly the subsequent lists change. So presumably the Suggest search is comparing the tree of all searches that start with 'horse ' with the tree of all searches that start with 'horses'. Forgetting about the news topicality issue that Bill has suggested and looking only at the word structures, this tree comparison seems to be very heavyweight and goes far beyond the C-analysis, etc. that our friend orion understands so well.

It's all somewhat baffling. :)

#5 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 24 December 2005 - 04:20 AM

Thank you. :)

There are some interesting things going on in this patent application.

I was a little surprised to see singular and plural in the list, but I think that while stemming goes on, the order of results may vary based upon whether a query term is singular or plural.

Another thought, why should Google have to "anticipate" popular queries? Wouldn't it make more sense to just cache the results of the top queries once a user asks for them (eg either cache all queries for 10 minutes - every time someone asks or just cache queries that were in the top 1% from yesterday).


The popularity issues are intriguing. It seems that there's more of a chance for the popular to become more popular the way this works. By adding from popular news stories, that may be one way to try to keep results from becoming stale.

Interesting differences seen between the FAQ for Google Select, and what the patent says, on the topic of personalization. Here's the FAQ:

Google Suggest does not base its suggestions on your personal searches, although it does use information about the relative popularity of common searches to rank its suggestions.


And the patent application (in the abstract):

This process may take into account prior queries submitted by a community of users, and may take into account a user profile.


There's more on personalization in the application, too. It will be interesting to see if they roll this into the personalized search, since people should know there that their profile is being used to help searches work.

The way the document also approaches efficiency in how the search engine operates is worth thinking about, too. As is the incremental approach to forming query results when a subsequent terms are added to the first word in a query. Does that skew results in odd ways?

Personalization is going to change where pages appear in search results. Maybe that is one of the best reasons to think about "popularity."

What strikes me as strange is that so many of those items are almost the full domain names of the companies, ie - I would "assume" that people would just type it in the address bar instead of going through Google.


Some things to think about there. A certain percentage of searches are navigational searches, where people are looking for a specific site.

Some searches were intended to be typed into the address bar, instead of the Google search box, but if people have Google as a home page, and start typing immediately after the search box "grabs" focus, they end up typing an address in the searchbox. Those searches may be of the following type:

The toolbar has a realnames feature on it, allowing people to browse by name. How many searches with a products name are intended to be typed in that address bar? Probably a very small percentage, but probably big enough to throw this off.

#6 Guest_orion_*

Guest_orion_*
  • Guests

Posted 14 January 2006 - 12:08 AM

...this tree comparison seems to be very heavyweight and goes far beyond the C-analysis,....



I don't think so.

Google Suggests is not a measure of co-occurrence and just does dynamic sorting of search volume, triggered letter by letter. Big deal. Reminds me of Aho-Corasick trie analysis, Knuth-Morris-Pratt, and Boyer-Moore Family of sequential searching described by Baeza-Yates in chapter 8 of Modern Information Retrieval.


A "c-analysis" can be used to measure first and higher-order co-occurrence from both document volume (search results) and search volume (query frequency). You cannot do this with Google Suggest.


BTW, regarding current search volume services, here is something that question the validity of such services: database fusion, mixing and lack of discrimination of query modes.
Temporal Co-Occurrence: How does a Developing Event Affects Search Results?



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users