Jump to content

Cre8asiteforums

Discussing Web Design & Marketing Since 1998

  • Announcements

    • cre8pc

      Thank you! Cre8asiteforums 1998 - 2018   01/18/2018

      Internet Marketing Ninjas released many of the online forums they had acquired, such as WebmasterWorld, SEOChat, several DevShed properties and these forums back to their founders. You will notice a new user interface for Cre8asiteforums, the software was upgraded, and it was moved to a new server. Thank you for your support as we turn 20 years old.  
Sign in to follow this  
eKstreme

List of papers and patents?

Recommended Posts

Hi

 

I'm looking for a list of papers and patents related to SEO. I'm bored and in need of a good reading list.

 

Know where I can get hold of one? Cheers!

 

Pierre

Share this post


Link to post
Share on other sites

Wait for Bill (bragadocchio) to read this thread, I'm sure he got what you need :blink:

Share this post


Link to post
Share on other sites

 

Some SEO Books, Papers, and Patents

 

 

 

Earlier Works

 

As We May Think

Nice early work on computing and citations. Included here more for historical perspective than anything, but worth reading.

 

From Resource Discovery to Knowledge Discovery on the Internet

This one starts with a discussion of "As We May Think" and builds upon it, looking at such things as inverted indexes and term weights. The two documents taken together are a nice jumping off point to the next document which discusses earlier search engines and optimization.

 

What is a tall poppy among web pages?

One of the better looks at SEO and Search Engines in the days before Google and pagerank. Views on SEO have changed since 1998, when this was published, but it gives a good sense of what the practice was like back then.

 

 

 

Pagerank

 

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Spend some time going over this one, and the next one. There's more interesting stuff in these papers than just the information about pagerank.

 

The PageRank Citation Ranking: Bringing Order to the Web

 

Then, I would recommend paying the $20 and getting this document from Yahoo!'s Pavel Berkhin:

 

A Survey on PageRank Computing

 

The list of cited sources at the end of the document could keep you busy for months, but the document itself is an excellent overview of different approaches to pagerank, and different ways of using it, amending it, altering it, and so on.

 

 

Books

 

There's some nice history and thoughtful analysis in John Battelle's The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture

 

If you want a good sense of how search engines work, this text by Soumen Chakrabarti is a fine introduction to the topic: Mining the Web: Analysis of Hypertext and Semi Structured Data (The Morgan Kaufmann Series in Data Management Systems)

 

 

Crawling the Web

 

There are more than a few papers about how search engines gather information, to bring into their index, and they are useful in understanding some of the reasons why some pages of a site might get indexed, and others don't. These two papers are ones that created some standard language that people use when describing how crawlers work, and have been expanded upon since. If you want to find out more about crawling, you might find some good papers by searching Cite Seer or Google Scholar for papers that cite the first paper I list below:

 

Efficient Crawling Through URL Ordering

 

Crawling the Web: Discovery and Maintenance of Large Scale Web Data (pdf)

One of the authors of the first paper in this section, Junghoo Cho, who is now a Google search engine scientist, continued his research on crawlers, and wrote this paper as his doctoral thesis, while being overseen by Hector Garcia-Molina. It expands upon the first paper, and spends some pages looking at freshness on the web. Together, these two are good starting points to later research.

 

 

Spam, Spam, and More Spam

 

Doesn't hurt to know some of the academic papers about spam:

 

Web Spam Taxonomy

 

Web Spam, Propaganda and Trust

 

There are a number more, and they are worth reading through.

 

 

Some patents and patent applications assigned to Google

 

I've focused upon the patents and patent applications that involve search rather than advertising, mapping, cell phone cases, email, and hardware. Hopefully, these will give you a good start. Newer documents first, in each section.

 

 

 

Some Patent Applications assigned to Google

 

Phrase-based searching in an information retrieval system

 

Enhanced document browsing with automatically generated links based on user information and context

 

Methods and systems for endorsing local search results

 

Systems and methods for spell correction of non-roman characters and words

 

Nonstandard text entry

 

Visually-oriented driving directions in digital mapping system

 

Variable length snippet generation

 

Systems and methods for determining user actions

 

Profile based capture component

 

Personalization of placed content ordering in search results

 

Named URL entry

 

Methods and systems for interfacing applications with a search engine

 

Methods and systems for information capture and retrieval

 

Systems and methods for weighting a search query result

 

Query rewriting with entity detection

 

Query rewriting with entity detection (different publication number)

 

Systems and methods for translating chinese pinyin to chinese characters

 

Assigning geographic location identifiers to web pages

 

Interface for a universal search

 

Systems and methods for personalizing aggregated news content

 

Generating hyperlinks and anchor text in HTML and non-HTML documents

 

Systems and methods for direct navigation to specific portion of target document

 

Systems and methods for unification of search results

 

Systems and methods for improving search quality

 

Methods and systems for improving a search ranking using article information

 

Systems and methods for determining document freshness

 

Methods and systems for personalized network searching

 

Methods and systems for information extraction

 

Information retrieval based on historical data

 

Personalization of web search

 

Systems and methods for clustering search results

 

Methods and systems for improving a search ranking using location awareness

 

Systems and methods for improving the ranking of news articles

 

Methods and systems for improving a search ranking using related queries

 

Methods and systems for improving a search ranking using population information

 

System and method for providing search query refinements

 

Systems and methods for determining user actions

 

System and method for presenting multiple sets of search results for a single query

 

System and method for providing a user interface with search query broadening

 

Systems and methods for searching using queries written in a different character-set and/or language from the target pages

 

Search query categorization for business listings search

 

Document search engine including highlighting of confident results

 

System and method for providing preferred country biasing of search results

 

Methods and systems for determining a meaning of a document to match the document to content

 

Methods and systems for understanding a meaning of a knowledge item using information associated with the knowledge item

 

System and method for providing definitions

 

Methods and systems for editing a network of interconnected concepts

 

System and method for providing preferred language ordering of search results

 

System and method for selecting content for displaying over the internet based upon some user input

 

Method for searching media

 

Method and apparatus for characterizing documents based on clusters of related words

 

System and method for selecting content for displaying over the internet based upon some user input

 

Methods and apparatus for providing search results in response to an ambiguous search query

 

Methods and apparatus for employing usage statistics in document retrieval

 

Interface and system for providing persistent contextual relevance for commerce activities in a networked environment

 

Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query

 

System and method for searching and recommending objects from a categorically organized information repository

 

 

 

Some Patents Assigned to Google

 

Methods and apparatus for determining equivalent descriptions for an information need

 

Address geocoding

 

Systems and methods for highlighting search results

 

Techniques for finding related hyperlinked documents using link-based analysis

 

System and method for selecting content for displaying over the internet based upon some user input

 

Information extraction from a database

 

Detecting duplicate and near-duplicate files

 

Detecting query-specific duplicate documents

 

Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query

 

Ranking search results by reranking the results based on local inter-connectivity

 

Real-time document collection search engine with phrase indexing

 

Methods for iteratively and interactively performing collection selection in full text searches

 

Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections

 

Real-time document collection search engine with phrase indexing

 

Method for automatically selecting collections to search in full text searches

 

Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents

 

<edit - fixed URL>

Edited by bragadocchio

Share this post


Link to post
Share on other sites

That should cure your boredom for a while, Pierre.

 

Awesome list, Bill. You never cease to amaze me with your thoroughness. :applause:

Share this post


Link to post
Share on other sites

Thanks.

 

Some fun stuff in there.

 

Get through those, and I'll have some more for you. :)

 

Though of course, if you make your way through all of those, I'll expect that you'll have quite a list of things that you may want to research yourself.

 

And I'd recommend seeing if you can uncover how Yahoo!, MSN, and ASK/Teoma might be trying to do the same, or similar things.

Share this post


Link to post
Share on other sites

Bill:

 

have you tested any of these to see how well they apply?

 

I've looked hard thru some of the geographic/local stuff and see some significant applications with regard to local search.

 

 

BTW: That is an awesome list!!!!!!

Dave

Share this post


Link to post
Share on other sites

Thanks, Dave

 

have you tested any of these to see how well they apply?

 

I've done some testing, and written about a number of these both here, and on my blog.

 

Part of the fun behind researching these is to try to figure out how to test them, or see how they might fit into some of the moves that the search engine might be making.

 

For instance, the patent application at the top of the list on phrase searching would make Google bombing as we know it obsolete. But I can simply test that by doing a search for the phrase "miserable failure" and seeing that Google Bombing still works. The other interesting thing about that one is that it overlaps some territory described in the infoseek patents at the bottom of the list that were assigned to Google, yet is different in a number of ways.

Share this post


Link to post
Share on other sites

I've only looked hard at the local applications.

 

Couple of quick observations. Last February, duing the Super Bowl update, local sites with good local identification information on the site moved up in rankings for relevant searches. I noted that as did some other webmasters. We discussed it and applied different reasoning to it. Having seen your description of the patent on your August '05 post, I think that the locator patent went into effect last February.

 

A second observation on the local patent.

 

I linked into a friend's site that went up in June of '05. It is somewhat associated with my business/site. I gave him an anchor text link for his service with a phrase for the service covering 2 states and a nearby city. The site has a nice geo description of its location with address and phone number. The service phrase part of the anchor text is not competitive.

 

The site has 1 good link (from my business) and some scraper bls.

 

Last I looked, the site ranked first for the phrase/ each state and city in Y and MSN. For Google the site had #1 allinanchor rankings for the phrase/geo area for all 3 areas.

 

G had it ranked #1 for the state in which the business is located for that phrase and it was ranked in the 200's in the other state and city. BTW the site is ranked #13 for G allinanchor for the phrase (alone).

 

Seems like application of the geographic locator patent applies a relevancy "filter" that can negate the "newness" sandbox filter.

 

Of further interest, the site ranks first for the business phrase with the town name where the business is located...but can't be found for nearby towns with the business phrase. The geo relevancy on the location is limited to the named town and state. There is no "closeness" benefit.

 

Thanks for providing this information.

 

Dave

Share this post


Link to post
Share on other sites

Good information, Dave.

 

Thank you.

 

That's an excellent example of how to test one of these patent applications.

Share this post


Link to post
Share on other sites

Wow! That's one loooong list. Thanks Bill! Now I have lots to read :)

 

For fun, I decided to measure how many "scrolls" it will take to scroll down it with my wheel: 10 scrolls.

Share this post


Link to post
Share on other sites

BTW:

 

That's a great advertisement for your blog. Now I gotta go there and see about your testing.

 

I'm lazy :)

 

In my limited testing on those patents that I've read, I've seen some areas where the patent(s) seem to have been applied and some where they haven't as you noted above

 

This is very fertile ground.

 

I don't think I can say thank you enough.

 

Dave

Share this post


Link to post
Share on other sites

Great list - thanks!

 

And... uh... are both the spam links supposed to go to the same place?

 

I really liked that paper and was eagerly looking forward to the second one...

 

(Not that I spam. I think it's a good idea to learn abut these things)

Share this post


Link to post
Share on other sites

Thanks,

 

I've edited the URL so that it goes to the right page - same site, different page. :)

Share this post


Link to post
Share on other sites

Man, Bill. I'm reading through some of these articles, and I can't say thanks enough. This list should be stickied or added to the newbie help pages. Thanks again!

Share this post


Link to post
Share on other sites
This list should be stickied or added to the newbie help pages. Thanks again!

 

169824[/snapback]

 

 

 

 

This list is darn scary for a newbie, LOL!!!! Just when I was beginning not to feel so dumb... Thanks Bill ;-) .

Share this post


Link to post
Share on other sites

No intent to intimidate, Debora

 

An excellent starting point is John Battelle's book, because it's written for a mass audience, but it does a great job of supplying information about the history and people, and the ideas behind how search engines work. And it's pretty readable.

 

The "what is a tall poppy" paper is also a good place to begin, and then the two pagerank ones. The first paper on crawling (ignore the math and try to understand the concepts) made a big difference to my understanding of how a search engine functions too.

 

Then take a look at the first two papers listed. The first one was written in 1945, and it's a favorite of mine. It's also a very nice lead in to the second paper. Once you have those down, the spam ones are good to look at.

 

You're welcome, eKstreme. :)

Share this post


Link to post
Share on other sites

Thanks much Bill :).

 

The first paper on crawling (ignore the math and try to understand the concepts) ...

 

169855[/snapback]

 

 

 

 

Ironically, the math isn't the scary part for me :) .

 

Love the sense of history you have Bill, this is a great list, and I will tiptoe into it.

Share this post


Link to post
Share on other sites
An excellent starting point is John Battelle's book, because it's written for a mass audience ...

I read the book last month, and it is pretty good. There were a few minor things I remember a bit differently, but those aside, it was fascinating to see that era of Internet history from a different perspective. Battelle also includes a chapter on the Future of Search which found me shaking my head far more often than nodding. Then again, I once predicted Goto.com would crash and burn, too, so I think maybe I'll read that chapter a second time just to be safe. :unsure:

Share this post


Link to post
Share on other sites

Hi Todd. Good to see you here.

 

Then again, I once predicted Goto.com would crash and burn, too, so I think maybe I'll read that chapter a second time just to be safe.

 

Never quite did understand their name change, but I guess you can't argue too much with a $1.6 Billion selling price.

Share this post


Link to post
Share on other sites

Actually, Bill, I think the name change is discussed in Battelle's book, though he doesn't quite tie cause to effect as directly as he probably could.

 

GoTo.com was originally designed as a search portal, in direct competition with all the other engines of that era, but quickly developed business relationships with many of them, like AV and AOL. Just as a manufacturer might want to avoid competing with its dealers for fear of losing them, Goto wanted to avoid the appearance of stepping on toes. I think the name change from goto.com to overture.com was meant to clearly signal the change of direction and priority.

Share this post


Link to post
Share on other sites

Good points, and a good description, Ron.

 

Back then when they went through their name change, I thought that they were running from something, instead of towards something.

Share this post


Link to post
Share on other sites

I remember having that same impression. I'd say 2 things contributed to that:

 

1. Bad marketing/PR

2. Everything else online was collapsing

Share this post


Link to post
Share on other sites

I thought I'd bring this back up, just because these resources are so valuable. :applause:

 

We've been looking at one of the patents over at seorefugee in this thead

 

Reviewing the patent and discussing the issue has helped me tighten my understanding of the patent and its application. (In fact I've a mistake in my initial post...and its still there-but you have to read the patent or Bill's discussion on it to find it ;))

 

If you can find a few gems by reading through these patent discussions and disecting them its got to help your efforts.

 

Dave

Share this post


Link to post
Share on other sites

Nice discussion over there, Dave.

 

I spent some time last night on the Google Definitions patent application, and wrote about that one on my blog: Looking at Google Definitions.

 

 

1. Bad marketing/PR

2. Everything else online was collapsing

 

To add a third, I remember thinking that they just didn't have what I thought at the time might be the most important piece - the search engine.

 

Nice that there are some articles online from back then to fill in some pieces of the picture. Here's one:

 

GoTo gambles with new name

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

×