Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

List of papers and patents?


  • Please log in to reply
28 replies to this topic

#1 eKstreme

eKstreme

    Hall of Fame

  • 1000 Post Club
  • 3399 posts

Posted 15 February 2006 - 03:48 AM

Hi

I'm looking for a list of papers and patents related to SEO. I'm bored and in need of a good reading list.

Know where I can get hold of one? Cheers!

Pierre

#2 Nadir

Nadir

    Light Speed Member

  • Members
  • 976 posts

Posted 15 February 2006 - 11:09 AM

Wait for Bill (bragadocchio) to read this thread, I'm sure he got what you need :blink:

#3 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 15 February 2006 - 12:50 PM

Some SEO Books, Papers, and Patents




Earlier Works



As We May Think
Nice early work on computing and citations. Included here more for historical perspective than anything, but worth reading.

From Resource Discovery to Knowledge Discovery on the Internet
This one starts with a discussion of "As We May Think" and builds upon it, looking at such things as inverted indexes and term weights. The two documents taken together are a nice jumping off point to the next document which discusses earlier search engines and optimization.

What is a tall poppy among web pages?
One of the better looks at SEO and Search Engines in the days before Google and pagerank. Views on SEO have changed since 1998, when this was published, but it gives a good sense of what the practice was like back then.


Pagerank



The Anatomy of a Large-Scale Hypertextual Web Search Engine
Spend some time going over this one, and the next one. There's more interesting stuff in these papers than just the information about pagerank.

The PageRank Citation Ranking: Bringing Order to the Web

Then, I would recommend paying the $20 and getting this document from Yahoo!'s Pavel Berkhin:

A Survey on PageRank Computing

The list of cited sources at the end of the document could keep you busy for months, but the document itself is an excellent overview of different approaches to pagerank, and different ways of using it, amending it, altering it, and so on.

Books



There's some nice history and thoughtful analysis in John Battelle's The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture

If you want a good sense of how search engines work, this text by Soumen Chakrabarti is a fine introduction to the topic: Mining the Web: Analysis of Hypertext and Semi Structured Data (The Morgan Kaufmann Series in Data Management Systems)

Crawling the Web



There are more than a few papers about how search engines gather information, to bring into their index, and they are useful in understanding some of the reasons why some pages of a site might get indexed, and others don't. These two papers are ones that created some standard language that people use when describing how crawlers work, and have been expanded upon since. If you want to find out more about crawling, you might find some good papers by searching Cite Seer or Google Scholar for papers that cite the first paper I list below:

Efficient Crawling Through URL Ordering

Crawling the Web: Discovery and Maintenance of Large Scale Web Data (pdf)
One of the authors of the first paper in this section, Junghoo Cho, who is now a Google search engine scientist, continued his research on crawlers, and wrote this paper as his doctoral thesis, while being overseen by Hector Garcia-Molina. It expands upon the first paper, and spends some pages looking at freshness on the web. Together, these two are good starting points to later research.

Spam, Spam, and More Spam



Doesn't hurt to know some of the academic papers about spam:

Web Spam Taxonomy

Web Spam, Propaganda and Trust

There are a number more, and they are worth reading through.

Some patents and patent applications assigned to Google



I've focused upon the patents and patent applications that involve search rather than advertising, mapping, cell phone cases, email, and hardware. Hopefully, these will give you a good start. Newer documents first, in each section.


Some Patent Applications assigned to Google



Phrase-based searching in an information retrieval system

Enhanced document browsing with automatically generated links based on user information and context

Methods and systems for endorsing local search results

Systems and methods for spell correction of non-roman characters and words

Nonstandard text entry

Visually-oriented driving directions in digital mapping system

Variable length snippet generation

Systems and methods for determining user actions

Profile based capture component

Personalization of placed content ordering in search results

Named URL entry

Methods and systems for interfacing applications with a search engine

Methods and systems for information capture and retrieval

Systems and methods for weighting a search query result

Query rewriting with entity detection

Query rewriting with entity detection (different publication number)

Systems and methods for translating chinese pinyin to chinese characters

Assigning geographic location identifiers to web pages

Interface for a universal search

Systems and methods for personalizing aggregated news content

Generating hyperlinks and anchor text in HTML and non-HTML documents

Systems and methods for direct navigation to specific portion of target document

Systems and methods for unification of search results

Systems and methods for improving search quality

Methods and systems for improving a search ranking using article information

Systems and methods for determining document freshness

Methods and systems for personalized network searching

Methods and systems for information extraction

Information retrieval based on historical data

Personalization of web search

Systems and methods for clustering search results

Methods and systems for improving a search ranking using location awareness

Systems and methods for improving the ranking of news articles

Methods and systems for improving a search ranking using related queries

Methods and systems for improving a search ranking using population information

System and method for providing search query refinements

Systems and methods for determining user actions

System and method for presenting multiple sets of search results for a single query

System and method for providing a user interface with search query broadening

Systems and methods for searching using queries written in a different character-set and/or language from the target pages

Search query categorization for business listings search

Document search engine including highlighting of confident results

System and method for providing preferred country biasing of search results

Methods and systems for determining a meaning of a document to match the document to content

Methods and systems for understanding a meaning of a knowledge item using information associated with the knowledge item

System and method for providing definitions

Methods and systems for editing a network of interconnected concepts

System and method for providing preferred language ordering of search results

System and method for selecting content for displaying over the internet based upon some user input

Method for searching media

Method and apparatus for characterizing documents based on clusters of related words

System and method for selecting content for displaying over the internet based upon some user input

Methods and apparatus for providing search results in response to an ambiguous search query

Methods and apparatus for employing usage statistics in document retrieval

Interface and system for providing persistent contextual relevance for commerce activities in a networked environment

Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query

System and method for searching and recommending objects from a categorically organized information repository


Some Patents Assigned to Google



Methods and apparatus for determining equivalent descriptions for an information need

Address geocoding

Systems and methods for highlighting search results

Techniques for finding related hyperlinked documents using link-based analysis

System and method for selecting content for displaying over the internet based upon some user input

Information extraction from a database

Detecting duplicate and near-duplicate files

Detecting query-specific duplicate documents

Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query

Ranking search results by reranking the results based on local inter-connectivity

Real-time document collection search engine with phrase indexing

Methods for iteratively and interactively performing collection selection in full text searches

Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections

Real-time document collection search engine with phrase indexing

Method for automatically selecting collections to search in full text searches

Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents

<edit - fixed URL>

Edited by bragadocchio, 15 February 2006 - 04:45 PM.


#4 Nadir

Nadir

    Light Speed Member

  • Members
  • 976 posts

Posted 15 February 2006 - 12:56 PM

I told you! ;-)

#5 Respree

Respree

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 5901 posts

Posted 15 February 2006 - 12:57 PM

That should cure your boredom for a while, Pierre.

Awesome list, Bill. You never cease to amaze me with your thoroughness. :applause:

#6 kensplace

kensplace

    Time Traveler Member

  • 1000 Post Club
  • 1497 posts

Posted 15 February 2006 - 12:57 PM

Wow, thats quite a list!

#7 Nadir

Nadir

    Light Speed Member

  • Members
  • 976 posts

Posted 15 February 2006 - 01:05 PM

I think it's worth a sticky

#8 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 15 February 2006 - 01:07 PM

Thanks.

Some fun stuff in there.

Get through those, and I'll have some more for you. :)

Though of course, if you make your way through all of those, I'll expect that you'll have quite a list of things that you may want to research yourself.

And I'd recommend seeing if you can uncover how Yahoo!, MSN, and ASK/Teoma might be trying to do the same, or similar things.

#9 earlpearl

earlpearl

    Hall of Fame

  • 1000 Post Club
  • 1341 posts

Posted 15 February 2006 - 01:21 PM

Bill:

have you tested any of these to see how well they apply?

I've looked hard thru some of the geographic/local stuff and see some significant applications with regard to local search.


BTW: That is an awesome list!!!!!!
Dave

#10 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 15 February 2006 - 02:01 PM

Thanks, Dave

have you tested any of these to see how well they apply?


I've done some testing, and written about a number of these both here, and on my blog.

Part of the fun behind researching these is to try to figure out how to test them, or see how they might fit into some of the moves that the search engine might be making.

For instance, the patent application at the top of the list on phrase searching would make Google bombing as we know it obsolete. But I can simply test that by doing a search for the phrase "miserable failure" and seeing that Google Bombing still works. The other interesting thing about that one is that it overlaps some territory described in the infoseek patents at the bottom of the list that were assigned to Google, yet is different in a number of ways.

#11 earlpearl

earlpearl

    Hall of Fame

  • 1000 Post Club
  • 1341 posts

Posted 15 February 2006 - 02:27 PM

I've only looked hard at the local applications.

Couple of quick observations. Last February, duing the Super Bowl update, local sites with good local identification information on the site moved up in rankings for relevant searches. I noted that as did some other webmasters. We discussed it and applied different reasoning to it. Having seen your description of the patent on your August '05 post, I think that the locator patent went into effect last February.

A second observation on the local patent.

I linked into a friend's site that went up in June of '05. It is somewhat associated with my business/site. I gave him an anchor text link for his service with a phrase for the service covering 2 states and a nearby city. The site has a nice geo description of its location with address and phone number. The service phrase part of the anchor text is not competitive.

The site has 1 good link (from my business) and some scraper bls.

Last I looked, the site ranked first for the phrase/ each state and city in Y and MSN. For Google the site had #1 allinanchor rankings for the phrase/geo area for all 3 areas.

G had it ranked #1 for the state in which the business is located for that phrase and it was ranked in the 200's in the other state and city. BTW the site is ranked #13 for G allinanchor for the phrase (alone).

Seems like application of the geographic locator patent applies a relevancy "filter" that can negate the "newness" sandbox filter.

Of further interest, the site ranks first for the business phrase with the town name where the business is located...but can't be found for nearby towns with the business phrase. The geo relevancy on the location is limited to the named town and state. There is no "closeness" benefit.

Thanks for providing this information.

Dave

#12 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 15 February 2006 - 02:54 PM

Good information, Dave.

Thank you.

That's an excellent example of how to test one of these patent applications.

#13 eKstreme

eKstreme

    Hall of Fame

  • 1000 Post Club
  • 3399 posts

Posted 15 February 2006 - 03:16 PM

Wow! That's one loooong list. Thanks Bill! Now I have lots to read :)

For fun, I decided to measure how many "scrolls" it will take to scroll down it with my wheel: 10 scrolls.

#14 earlpearl

earlpearl

    Hall of Fame

  • 1000 Post Club
  • 1341 posts

Posted 15 February 2006 - 04:31 PM

BTW:

That's a great advertisement for your blog. Now I gotta go there and see about your testing.

I'm lazy :)

In my limited testing on those patents that I've read, I've seen some areas where the patent(s) seem to have been applied and some where they haven't as you noted above

This is very fertile ground.

I don't think I can say thank you enough.

Dave

#15 rmccarley

rmccarley

    Light Speed Member

  • Members
  • 642 posts

Posted 15 February 2006 - 04:32 PM

Great list - thanks!

And... uh... are both the spam links supposed to go to the same place?

I really liked that paper and was eagerly looking forward to the second one...

(Not that I spam. I think it's a good idea to learn abut these things)

#16 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 15 February 2006 - 04:46 PM

Thanks,

I've edited the URL so that it goes to the right page - same site, different page. :)

#17 eKstreme

eKstreme

    Hall of Fame

  • 1000 Post Club
  • 3399 posts

Posted 15 February 2006 - 04:49 PM

Man, Bill. I'm reading through some of these articles, and I can't say thanks enough. This list should be stickied or added to the newbie help pages. Thanks again!

#18 dgeary9

dgeary9

    Mach 1 Member

  • Members
  • 334 posts

Posted 15 February 2006 - 07:31 PM

This list should be stickied or added to the newbie help pages. Thanks again!

View Post


This list is darn scary for a newbie, LOL!!!! Just when I was beginning not to feel so dumb... Thanks Bill ;-) .

#19 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 15 February 2006 - 07:49 PM

No intent to intimidate, Debora

An excellent starting point is John Battelle's book, because it's written for a mass audience, but it does a great job of supplying information about the history and people, and the ideas behind how search engines work. And it's pretty readable.

The "what is a tall poppy" paper is also a good place to begin, and then the two pagerank ones. The first paper on crawling (ignore the math and try to understand the concepts) made a big difference to my understanding of how a search engine functions too.

Then take a look at the first two papers listed. The first one was written in 1945, and it's a favorite of mine. It's also a very nice lead in to the second paper. Once you have those down, the spam ones are good to look at.

You're welcome, eKstreme. :)

#20 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3935 posts

Posted 15 February 2006 - 09:04 PM


Wow Bill, you always amaze me!

#21 dgeary9

dgeary9

    Mach 1 Member

  • Members
  • 334 posts

Posted 15 February 2006 - 10:24 PM

Thanks much Bill :).

The first paper on crawling (ignore the math and try to understand the concepts) ...

View Post


Ironically, the math isn't the scary part for me :) .

Love the sense of history you have Bill, this is a great list, and I will tiptoe into it.

#22 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 16 February 2006 - 06:52 AM

An excellent starting point is John Battelle's book, because it's written for a mass audience ...

I read the book last month, and it is pretty good. There were a few minor things I remember a bit differently, but those aside, it was fascinating to see that era of Internet history from a different perspective. Battelle also includes a chapter on the Future of Search which found me shaking my head far more often than nodding. Then again, I once predicted Goto.com would crash and burn, too, so I think maybe I'll read that chapter a second time just to be safe. :unsure:

#23 stuntdubl

stuntdubl

    Unlurked Energy

  • Members
  • 4 posts

Posted 16 February 2006 - 01:05 PM

Wow. Very impressive Bill. :applause:

Edited by stuntdubl, 16 February 2006 - 01:06 PM.


#24 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 16 February 2006 - 03:10 PM

Hi Todd. Good to see you here.

Then again, I once predicted Goto.com would crash and burn, too, so I think maybe I'll read that chapter a second time just to be safe.


Never quite did understand their name change, but I guess you can't argue too much with a $1.6 Billion selling price.

#25 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 16 February 2006 - 06:31 PM

Actually, Bill, I think the name change is discussed in Battelle's book, though he doesn't quite tie cause to effect as directly as he probably could.

GoTo.com was originally designed as a search portal, in direct competition with all the other engines of that era, but quickly developed business relationships with many of them, like AV and AOL. Just as a manufacturer might want to avoid competing with its dealers for fear of losing them, Goto wanted to avoid the appearance of stepping on toes. I think the name change from goto.com to overture.com was meant to clearly signal the change of direction and priority.

#26 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 17 February 2006 - 03:02 AM

Good points, and a good description, Ron.

Back then when they went through their name change, I thought that they were running from something, instead of towards something.

#27 rmccarley

rmccarley

    Light Speed Member

  • Members
  • 642 posts

Posted 17 February 2006 - 02:06 PM

I remember having that same impression. I'd say 2 things contributed to that:

1. Bad marketing/PR
2. Everything else online was collapsing

#28 earlpearl

earlpearl

    Hall of Fame

  • 1000 Post Club
  • 1341 posts

Posted 20 February 2006 - 08:36 PM

I thought I'd bring this back up, just because these resources are so valuable. :applause:

We've been looking at one of the patents over at seorefugee in this thead

Reviewing the patent and discussing the issue has helped me tighten my understanding of the patent and its application. (In fact I've a mistake in my initial post...and its still there-but you have to read the patent or Bill's discussion on it to find it ;))

If you can find a few gems by reading through these patent discussions and disecting them its got to help your efforts.

Dave

#29 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 21 February 2006 - 04:23 PM

Nice discussion over there, Dave.

I spent some time last night on the Google Definitions patent application, and wrote about that one on my blog: Looking at Google Definitions.


1. Bad marketing/PR
2. Everything else online was collapsing


To add a third, I remember thinking that they just didn't have what I thought at the time might be the most important piece - the search engine.

Nice that there are some articles online from back then to fill in some pieces of the picture. Here's one:

GoTo gambles with new name



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users