Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Stopwords On Milk Cartons


  • Please log in to reply
2 replies to this topic

#1 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 18 January 2008 - 06:25 PM

It used to be that when you did a search for the following phrase, you triggered a number of "Stop Word" warnings, and you received some odd results:

To be, or not to be


Now, you get a selection of search results featuring Shakespeare's Hamlet.  No Stop Word warnings shown.

What happened to the Stop Word Warnings?

A stop word is a frequently appearing word that search engines would try to avoid indexing because there were so many of them, and it would take up too much space, and use up too much computational power to include them in searches.

It wasn't unusual to search for (without the quotation marks) something like "A room with a view" and receive results for "a room to view" or "a room i did view" or other variations.

Dan Thies wrote about the missing stopwords in:

Stop Words Are Dead! Did I Miss Another Memo?

So, what happened to the Stop Words?

Part of the answer might be found in a new indexing method from Google, described in a recently granted patent"

Document compression scheme that supports searching and partial decompression

The patent organizes data in a different fashion, which means a smaller index.

It also allows for only the parts needed to be the parts decompressed, so that they can be viewed to see if they contain the least frequently occurring terms in a query, and then if the more frequently used terms (the stop words) are near them.

And, Multi-Staged Query Processing, too

Another part of this indexing system may include a new way of handling queries when someone searches.

A patent application that is listed as related to the one dealing with compression explores a way to handle queries that's somewhat different:

Multi-stage query processing system and method for use with tokenspace repository

A multi-stage query processing system and method enables multi-stage query scoring, including "snippet" generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme.

At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user.

The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system.

In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list.


What might this mean for an internet marketer?

It might mean a lot.  

The multi-staged part may include finding query expansion terms and looking at stemming of query terms in one stage.

It might include a phrase-based indexing reranking of results within another stage.  Here's a picture from the patent application which shows some of the possible stages and some of what may happen at each:

http://www.seobythes...ages/token5.GIF

(Edited so that I'm linking to the picture instead of displaying it)

#2 Respree

Respree

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 5901 posts

Posted 18 January 2008 - 06:46 PM

:offtopic:
You might want to resize the image a bit. I'm getting a bad scroll right problem, making the post difficult to read.

#3 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 18 January 2008 - 09:56 PM

Thanks, Garrick.

Resizing it would have made it too small to read, so I linked to it instead. :)



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users