Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Deletion Probablilities for Better Ads

  • Please log in to reply
3 replies to this topic

#1 BillSlawski


    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15667 posts

Posted 19 June 2006 - 01:26 AM

When you perform a search using two terms, it's possible that one of the two terms may be more relevant than the other for what you are trying to find.

If no one has bid on both words in their advertising, then being able to tell which term is the most relevant may enable a search engine to show ads that fit the most relevant term, rather than the less relevant term.

This could be done by looking at two word searches from users, and seeing if they might delete one of the words in a follow-up search. Search engineers might be able to set something up to find such deletions, and create a "deletion probability score" for terms.

A patent application from Yahoo explains how this might be done:

System and methods for ranking the relative value of terms in a multi-term search query using deletion prediction -

Search using document number 20060129534 here. (I'm having problems linking to the patent application directly, from the forums)

Here's an example of how a deletion probability score might be created.
  • Look at the search engine log files for queries that use two search terms, and pull aside all of the ones that share a term like “Honda."
  • The other word/term could be anything.
  • See if there are any follow-up searches from the searchers who used these two term queries, and see if those subsequent searches involve deleting either “Honda” or the other term.
  • If so, calculate the deletion probability score for “Honda” by:
    • Count the number of times a word is deleted in a follow-up search from a user in a two word search query which includes Honda. Let's say in this instance, that happened 6059 times.
    • Look at how many times Honda was the term deleted. In this example, that might have been 1874 times.
    • Take the number of times Honda was deleted, divided by the number of times any word was deleted. Here, that would be 1874/6059, or about 0.31. That's the probability deletion score for Honda, for a two term query.
  • That deletion probability score for Honda would then be add to the list of deletion probability scores for other terms.
Some phrases, that have unique meanings as phrases would avoid using this method - for example, this method wouldn't be used for a search on "New Mexico" because that search isn't about things that are new, and things that are from Mexico, but rather things that involve the State of New Mexico.

It's an interesting way to track, and attempt to incorporate user behavoir to search results, and relevant ads to users.

#2 A.N.Onym


    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 4003 posts

Posted 19 June 2006 - 01:34 AM

Thanks for bringing it up, Bill <_<

Though an interest concept, it still, inevitably, has room for error.
Was the most often deleted word really irrelevant or was too common, which could obscure results?

For instance: 'backup dvd program'. If we delete the word 'program', we get 3mil less results, though the phrase got obviously more general. Will the algorithm take this into account?

(Though the point here is not in increasing the amount of sites for the phrase, but of SERPs quality. The word 'program' in the example only seems to add the number of sites without adjusting the SERPs a lot.)

Hope that made some sense.

Also, how good will they be at detecting the phrases such as "New Mexico State"?

Sorry, didn't read the patent - they are just too mind-boggling for me to bear.

Thanks again :)

Edited by A.N.Onym, 19 June 2006 - 01:42 AM.

#3 BillSlawski


    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15667 posts

Posted 19 June 2006 - 02:08 AM

Also, how good will they be at detecting the phrases such as "New Mexico State"?

The patent application describes this kind of instance where there are two words that are a recognizable concept or phrase (or unit). "Backup program" might also be considered a recognizable concept, so it's a similar kind of example.

The algorithm might look at the deletion probability of "dvd" versus the deletion probability of "backup program." The same could be true with "New Mexico" and "State," though it might also see "New Mexico State" as a concept, in which case this deletion probability may not be helpful.

This application doesn't really get into the concept of recognizing phrases. Yahoo has a series of at least five patents and patent applications that try to determine whether keywords are "units." For instance, when you search for "ice cream" you aren't looking for all the documents that have "ice" and all the documents that have "cream." You want the ones that have "ice cream."

More on that, here:

Yahoo! Superunits: of signatures and co-occurrence.

One example that they include in the patent where this deletion probability score might be used is the following pair of search terms:

Honda test

If the idea behind this is to show relevant ads for a pair of terms (as opposed to a concept that might be recognized through one of the unit or superunit processes) that someone is searching for , I would guess that they hope that more people tend to delete "test" then delete "honda" when they use either word in a two word long search, and repeat the search after deleting one of the words.

I say this because it is unlikely that someone will be paying to display ads for "honda test" and they might want ads to display that are more relevant to "honda" than to "test."

I'll add a snippet from the patent that involves the idea of units, cause that might help make what I just said clearer (I hope.)

In another embodiment, the exemplary search query has three words, two of which are a unit.* This means that the exemplary search query of three words has two terms. The term that is a unit will be classified as a term in step 206, the word that is not part of the term will be classified as a term in step 210. It is determined in step 212 that there are two terms and step 214 asks whether one term has a deletion probability score. If so, the term is assigned its score in 218. If not, the term is assigned a default score in step 216. Then the second term is assigned its deletion probability score in step 218, if the second term has a deletion probability score and if not is assigned a default deletion probability score. The absolute differences between these scores are calculated in step 222, and if in step 224 the absolute difference is greater than the threshold, an exact match for the term among the ad listings is sought in step 226, and, if found, the ad is returned along with the search results and placed in the designated space on the page, and the process stops. However, if no exact match in step 226 or the absolute difference in the deletion probability score was less than the threshold, the process would have stopped.

* My emphasis.

#4 BillSlawski


    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15667 posts

Posted 21 June 2006 - 10:23 PM

The inventors behind this patent wrote a very short paper (2 pages) on it as a poster for the 2003 SIGIR.

It's written in language that's a lot easier to read and understand than the patent, and brings out a number of different ideas. Its at:

Query word deletion prediction

Here's a snippet:

In leftmost deletion we predict that the leftmost word of a query is deleted. This scheme is intuitive, if queries are mostly in English, and consist of well-formed noun-phrases, with optional adjectives appearing on the left. In addition, it is applicable to rare queries, for which we have no information about the likelihood of deletion of individual words. Rightmost deletion is analogous to leftmost deletion and makes sense if queries are built up from left to right, with the user inputting the most important terms first, then adding less important terms towards the end.

RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users