Reply to this topicStart new topic
> Systems and methods for improving search quality

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jul 10 2005, 04:54 AM
I had a chance to spend some time on one of the newer patent applications from this last week:

Systems and methods for improving search quality
United States Patent Application 20050149499
Inventors: Alexander M. Franz, Monika Henzinger
Assignee: Google Inc.
Filed: December 30, 2003
Published July 7, 2005

The method claimed in this patent application involves receiving a query, and deciding if it uses:

A. compound query terms, or alternative representations of those compound query terms;
B. query terms that can be found in a set of inflectional forms. If it does, expanding the query to include others form that set; and
C. query terms that can be found in a set of alternative spellings, If it does, expanding the query to include others from that set;

Then searching a database using the expanded query; and returning results to a user.

Making such a decision may mean that the search engine would have to create those types of associations when indexing documents.

Compound words would include hyphenated words, and would create associations between pages that used hyphenated words and pages that used non-hyphenated words corresponding to those hyphenated words.

This is the reason stated in the patent application why this set of methods is being considered for use in coming up with search results:

QUOTE
In an information retrieval system, a user typically enters a query and receives a list of documents that contain the query terms. Documents that do not contain the query terms are ignored. Such systems thus place a premium on proper query formulation. What is needed are systems and methods for improving queries such that they are more likely to yield useful search results.



While several examples are for the German language, the general principles can be applied to other languages, too. And while they are applied to web pages, documents scanned into electronic form may also be searched.


Compounds

Example: "fernsehprogramm" (meaning television program) can be "fernsehprogramm" or "fernseh-programm." If you search for one, you will fail to receive documents for the other. The use of a dictionary, or a dynamic search over a body of documents to create a list of compound terms, may improve results to possibly include both.

Whether an item appears on that list as alternatives might depend upon the total number of joined, or hypenated or pairs of words appearing on the web. If they show up enough, they could be used to expand the query.

In some document formats, the file type may be why hypenation is included, such as in Postscript or PDFs, which use hyphens in words at the end of lines.


Inflections

Many words have different inflectional forms to express grammatical relationships: case, gender, number, person, tense, or mood.

Examples in English: plurals in nouns, past tenses of words, changes to the base word itself, i.e., speak, spoke, spoken.

The use of a dictionary, or a dynamic search over a body of documents to create a list of inflectional forms, may improve results to possibly include both. For German, the inflection sets could use a language analysis or a word form analyzer with a relatively large lexicon of root forms.

Whether an item appears on that list as alternatives might depend upon the total number of inflectional forms created from a body of web pages, where a word form analyzer was used to map inflected words and roots. This would be filtered by a suitable number or percentage of appearances of the inflectional forms.

Example: a search for "auto spiel" could become a search for "(auto OR autos) (spiel OR spiele OR spiel OR spiele OR spielen OR spieles OR spiels)." That expanded query could then be used for the search.

Example, a search for abisolieren," could be expanded to "abisolieren OR abisolierten OR abisolierte."


Orthographic Variations

Many languages include words that can be spelled in different ways.

The use of a dictionary to create a list of spelling variations may improve results to possibly include both. For German, the spelling variations could use a language analysis or a word form analyzer with a relatively large lexicon of root forms.


Other possibilities

The techniques above could be applied together as necessary, and possibly with other techniques.

Examples:

spelling correction,
synonym and/or related-word expansion,
language translation,
spam reduction,
others.


Additional considerations

To enhance results, multiple searches could be performed.

For example, the original query, followed by one or more searches using expanded or re-written versions of that search. Results could be evaluated based upon the user's preferences and search history, and results determined to be most likely to be useful could be returned. These could be either set, or a mix where lesser weights were given to the expanded set

There is also a variation of that approach could involve expanding the index before hand instead of at the time of the query, or a combination of index expansion and query expansion.

While many examples were in the context of the German language, the techniques described can be applied to other languages.
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 06:49 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed