List of papers and patents?
Started by eKstreme, Feb 15 2006 03:48 AM
28 replies to this topic
#3
Posted 15 February 2006 - 12:50 PM
Some SEO Books, Papers, and Patents
Earlier Works
As We May Think
Nice early work on computing and citations. Included here more for historical perspective than anything, but worth reading.
From Resource Discovery to Knowledge Discovery on the Internet
This one starts with a discussion of "As We May Think" and builds upon it, looking at such things as inverted indexes and term weights. The two documents taken together are a nice jumping off point to the next document which discusses earlier search engines and optimization.
What is a tall poppy among web pages?
One of the better looks at SEO and Search Engines in the days before Google and pagerank. Views on SEO have changed since 1998, when this was published, but it gives a good sense of what the practice was like back then.
Pagerank
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Spend some time going over this one, and the next one. There's more interesting stuff in these papers than just the information about pagerank.
The PageRank Citation Ranking: Bringing Order to the Web
Then, I would recommend paying the $20 and getting this document from Yahoo!'s Pavel Berkhin:
A Survey on PageRank Computing
The list of cited sources at the end of the document could keep you busy for months, but the document itself is an excellent overview of different approaches to pagerank, and different ways of using it, amending it, altering it, and so on.
Books
There's some nice history and thoughtful analysis in John Battelle's The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture
If you want a good sense of how search engines work, this text by Soumen Chakrabarti is a fine introduction to the topic: Mining the Web: Analysis of Hypertext and Semi Structured Data (The Morgan Kaufmann Series in Data Management Systems)
Crawling the Web
There are more than a few papers about how search engines gather information, to bring into their index, and they are useful in understanding some of the reasons why some pages of a site might get indexed, and others don't. These two papers are ones that created some standard language that people use when describing how crawlers work, and have been expanded upon since. If you want to find out more about crawling, you might find some good papers by searching Cite Seer or Google Scholar for papers that cite the first paper I list below:
Efficient Crawling Through URL Ordering
Crawling the Web: Discovery and Maintenance of Large Scale Web Data (pdf)
One of the authors of the first paper in this section, Junghoo Cho, who is now a Google search engine scientist, continued his research on crawlers, and wrote this paper as his doctoral thesis, while being overseen by Hector Garcia-Molina. It expands upon the first paper, and spends some pages looking at freshness on the web. Together, these two are good starting points to later research.
Spam, Spam, and More Spam
Doesn't hurt to know some of the academic papers about spam:
Web Spam Taxonomy
Web Spam, Propaganda and Trust
There are a number more, and they are worth reading through.
Some patents and patent applications assigned to Google
I've focused upon the patents and patent applications that involve search rather than advertising, mapping, cell phone cases, email, and hardware. Hopefully, these will give you a good start. Newer documents first, in each section.
Some Patent Applications assigned to Google
Phrase-based searching in an information retrieval system
Enhanced document browsing with automatically generated links based on user information and context
Methods and systems for endorsing local search results
Systems and methods for spell correction of non-roman characters and words
Nonstandard text entry
Visually-oriented driving directions in digital mapping system
Variable length snippet generation
Systems and methods for determining user actions
Profile based capture component
Personalization of placed content ordering in search results
Named URL entry
Methods and systems for interfacing applications with a search engine
Methods and systems for information capture and retrieval
Systems and methods for weighting a search query result
Query rewriting with entity detection
Query rewriting with entity detection (different publication number)
Systems and methods for translating chinese pinyin to chinese characters
Assigning geographic location identifiers to web pages
Interface for a universal search
Systems and methods for personalizing aggregated news content
Generating hyperlinks and anchor text in HTML and non-HTML documents
Systems and methods for direct navigation to specific portion of target document
Systems and methods for unification of search results
Systems and methods for improving search quality
Methods and systems for improving a search ranking using article information
Systems and methods for determining document freshness
Methods and systems for personalized network searching
Methods and systems for information extraction
Information retrieval based on historical data
Personalization of web search
Systems and methods for clustering search results
Methods and systems for improving a search ranking using location awareness
Systems and methods for improving the ranking of news articles
Methods and systems for improving a search ranking using related queries
Methods and systems for improving a search ranking using population information
System and method for providing search query refinements
Systems and methods for determining user actions
System and method for presenting multiple sets of search results for a single query
System and method for providing a user interface with search query broadening
Systems and methods for searching using queries written in a different character-set and/or language from the target pages
Search query categorization for business listings search
Document search engine including highlighting of confident results
System and method for providing preferred country biasing of search results
Methods and systems for determining a meaning of a document to match the document to content
Methods and systems for understanding a meaning of a knowledge item using information associated with the knowledge item
System and method for providing definitions
Methods and systems for editing a network of interconnected concepts
System and method for providing preferred language ordering of search results
System and method for selecting content for displaying over the internet based upon some user input
Method for searching media
Method and apparatus for characterizing documents based on clusters of related words
System and method for selecting content for displaying over the internet based upon some user input
Methods and apparatus for providing search results in response to an ambiguous search query
Methods and apparatus for employing usage statistics in document retrieval
Interface and system for providing persistent contextual relevance for commerce activities in a networked environment
Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query
System and method for searching and recommending objects from a categorically organized information repository
Some Patents Assigned to Google
Methods and apparatus for determining equivalent descriptions for an information need
Address geocoding
Systems and methods for highlighting search results
Techniques for finding related hyperlinked documents using link-based analysis
System and method for selecting content for displaying over the internet based upon some user input
Information extraction from a database
Detecting duplicate and near-duplicate files
Detecting query-specific duplicate documents
Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query
Ranking search results by reranking the results based on local inter-connectivity
Real-time document collection search engine with phrase indexing
Methods for iteratively and interactively performing collection selection in full text searches
Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections
Real-time document collection search engine with phrase indexing
Method for automatically selecting collections to search in full text searches
Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents
<edit - fixed URL>
Edited by bragadocchio, 15 February 2006 - 04:45 PM.
#8
Posted 15 February 2006 - 01:07 PM
Thanks.
Some fun stuff in there.
Get through those, and I'll have some more for you.
Though of course, if you make your way through all of those, I'll expect that you'll have quite a list of things that you may want to research yourself.
And I'd recommend seeing if you can uncover how Yahoo!, MSN, and ASK/Teoma might be trying to do the same, or similar things.
Some fun stuff in there.
Get through those, and I'll have some more for you.
Though of course, if you make your way through all of those, I'll expect that you'll have quite a list of things that you may want to research yourself.
And I'd recommend seeing if you can uncover how Yahoo!, MSN, and ASK/Teoma might be trying to do the same, or similar things.
#10
Posted 15 February 2006 - 02:01 PM
Thanks, Dave
I've done some testing, and written about a number of these both here, and on my blog.
Part of the fun behind researching these is to try to figure out how to test them, or see how they might fit into some of the moves that the search engine might be making.
For instance, the patent application at the top of the list on phrase searching would make Google bombing as we know it obsolete. But I can simply test that by doing a search for the phrase "miserable failure" and seeing that Google Bombing still works. The other interesting thing about that one is that it overlaps some territory described in the infoseek patents at the bottom of the list that were assigned to Google, yet is different in a number of ways.
have you tested any of these to see how well they apply?
I've done some testing, and written about a number of these both here, and on my blog.
Part of the fun behind researching these is to try to figure out how to test them, or see how they might fit into some of the moves that the search engine might be making.
For instance, the patent application at the top of the list on phrase searching would make Google bombing as we know it obsolete. But I can simply test that by doing a search for the phrase "miserable failure" and seeing that Google Bombing still works. The other interesting thing about that one is that it overlaps some territory described in the infoseek patents at the bottom of the list that were assigned to Google, yet is different in a number of ways.
#11
Posted 15 February 2006 - 02:27 PM
I've only looked hard at the local applications.
Couple of quick observations. Last February, duing the Super Bowl update, local sites with good local identification information on the site moved up in rankings for relevant searches. I noted that as did some other webmasters. We discussed it and applied different reasoning to it. Having seen your description of the patent on your August '05 post, I think that the locator patent went into effect last February.
A second observation on the local patent.
I linked into a friend's site that went up in June of '05. It is somewhat associated with my business/site. I gave him an anchor text link for his service with a phrase for the service covering 2 states and a nearby city. The site has a nice geo description of its location with address and phone number. The service phrase part of the anchor text is not competitive.
The site has 1 good link (from my business) and some scraper bls.
Last I looked, the site ranked first for the phrase/ each state and city in Y and MSN. For Google the site had #1 allinanchor rankings for the phrase/geo area for all 3 areas.
G had it ranked #1 for the state in which the business is located for that phrase and it was ranked in the 200's in the other state and city. BTW the site is ranked #13 for G allinanchor for the phrase (alone).
Seems like application of the geographic locator patent applies a relevancy "filter" that can negate the "newness" sandbox filter.
Of further interest, the site ranks first for the business phrase with the town name where the business is located...but can't be found for nearby towns with the business phrase. The geo relevancy on the location is limited to the named town and state. There is no "closeness" benefit.
Thanks for providing this information.
Dave
Couple of quick observations. Last February, duing the Super Bowl update, local sites with good local identification information on the site moved up in rankings for relevant searches. I noted that as did some other webmasters. We discussed it and applied different reasoning to it. Having seen your description of the patent on your August '05 post, I think that the locator patent went into effect last February.
A second observation on the local patent.
I linked into a friend's site that went up in June of '05. It is somewhat associated with my business/site. I gave him an anchor text link for his service with a phrase for the service covering 2 states and a nearby city. The site has a nice geo description of its location with address and phone number. The service phrase part of the anchor text is not competitive.
The site has 1 good link (from my business) and some scraper bls.
Last I looked, the site ranked first for the phrase/ each state and city in Y and MSN. For Google the site had #1 allinanchor rankings for the phrase/geo area for all 3 areas.
G had it ranked #1 for the state in which the business is located for that phrase and it was ranked in the 200's in the other state and city. BTW the site is ranked #13 for G allinanchor for the phrase (alone).
Seems like application of the geographic locator patent applies a relevancy "filter" that can negate the "newness" sandbox filter.
Of further interest, the site ranks first for the business phrase with the town name where the business is located...but can't be found for nearby towns with the business phrase. The geo relevancy on the location is limited to the named town and state. There is no "closeness" benefit.
Thanks for providing this information.
Dave
#14
Posted 15 February 2006 - 04:31 PM
BTW:
That's a great advertisement for your blog. Now I gotta go there and see about your testing.
I'm lazy
In my limited testing on those patents that I've read, I've seen some areas where the patent(s) seem to have been applied and some where they haven't as you noted above
This is very fertile ground.
I don't think I can say thank you enough.
Dave
That's a great advertisement for your blog. Now I gotta go there and see about your testing.
I'm lazy
In my limited testing on those patents that I've read, I've seen some areas where the patent(s) seem to have been applied and some where they haven't as you noted above
This is very fertile ground.
I don't think I can say thank you enough.
Dave
#19
Posted 15 February 2006 - 07:49 PM
No intent to intimidate, Debora
An excellent starting point is John Battelle's book, because it's written for a mass audience, but it does a great job of supplying information about the history and people, and the ideas behind how search engines work. And it's pretty readable.
The "what is a tall poppy" paper is also a good place to begin, and then the two pagerank ones. The first paper on crawling (ignore the math and try to understand the concepts) made a big difference to my understanding of how a search engine functions too.
Then take a look at the first two papers listed. The first one was written in 1945, and it's a favorite of mine. It's also a very nice lead in to the second paper. Once you have those down, the spam ones are good to look at.
You're welcome, eKstreme.
An excellent starting point is John Battelle's book, because it's written for a mass audience, but it does a great job of supplying information about the history and people, and the ideas behind how search engines work. And it's pretty readable.
The "what is a tall poppy" paper is also a good place to begin, and then the two pagerank ones. The first paper on crawling (ignore the math and try to understand the concepts) made a big difference to my understanding of how a search engine functions too.
Then take a look at the first two papers listed. The first one was written in 1945, and it's a favorite of mine. It's also a very nice lead in to the second paper. Once you have those down, the spam ones are good to look at.
You're welcome, eKstreme.
Reply to this topic

0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users







