Reply to this topicStart new topic
> Stopwords On Milk Cartons, Compression and Multi-Staged Query Processing Implicated

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jan 18 2008, 06:25 PM
It used to be that when you did a search for the following phrase, you triggered a number of "Stop Word" warnings, and you received some odd results:

To be, or not to be


Now, you get a selection of search results featuring Shakespeare's Hamlet. No Stop Word warnings shown.

What happened to the Stop Word Warnings?

A stop word is a frequently appearing word that search engines would try to avoid indexing because there were so many of them, and it would take up too much space, and use up too much computational power to include them in searches.

It wasn't unusual to search for (without the quotation marks) something like "A room with a view" and receive results for "a room to view" or "a room i did view" or other variations.

Dan Thies wrote about the missing stopwords in:

Stop Words Are Dead! Did I Miss Another Memo?

So, what happened to the Stop Words?

Part of the answer might be found in a new indexing method from Google, described in a recently granted patent"

Document compression scheme that supports searching and partial decompression

The patent organizes data in a different fashion, which means a smaller index.

It also allows for only the parts needed to be the parts decompressed, so that they can be viewed to see if they contain the least frequently occurring terms in a query, and then if the more frequently used terms (the stop words) are near them.

And, Multi-Staged Query Processing, too

Another part of this indexing system may include a new way of handling queries when someone searches.

A patent application that is listed as related to the one dealing with compression explores a way to handle queries that's somewhat different:

Multi-stage query processing system and method for use with tokenspace repository

QUOTE
A multi-stage query processing system and method enables multi-stage query scoring, including "snippet" generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme.

At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user.

The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system.

In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.


What might this mean for an internet marketer?

It might mean a lot.

The multi-staged part may include finding query expansion terms and looking at stemming of query terms in one stage.

It might include a phrase-based indexing reranking of results within another stage. Here's a picture from the patent application which shows some of the possible stages and some of what may happen at each:

http://www.seobythesea.com/wp-images/token5.GIF

(Edited so that I'm linking to the picture instead of displaying it)
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 11-February 04
Posts: 5,892
From: Los Angeles, CA
post Jan 18 2008, 06:46 PM
offtopic.gif
You might want to resize the image a bit. I'm getting a bad scroll right problem, making the post difficult to read.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jan 18 2008, 09:56 PM
Thanks, Garrick.

Resizing it would have made it too small to read, so I linked to it instead. wink-2.gif
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 04:13 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed