Reply to this topicStart new topic
> Freshness of Pages Revisited by Google

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jun 30 2005, 06:50 AM
There's been a recent revival of interest in the Historical Data patent application that we discussed in a thread a while back called:

Of Sandboxes and Toolbars: Google's New Patent Application

There's a continuation of that patent application that was released this morning by the US Patent Office in a new patent application called:

Systems and methods for determining document freshness

The inventor is listed as Monika Henzinger, and the original file date is June 30, 2004. The publication date is June 30, 2005.

The abstract:

QUOTE
Abstract

A system determines a freshness of a first document. The system determines whether a freshness attribute is associated with the first document. The system identifies, based on the determination, a set of second documents that each contain a link to the first document. The system assigns a freshness score to the first document based on a freshness attribute associated with each document of the set of second documents or the freshness attribute associated with the first document


One of the problems that it addresses is that a "last-modified-since" on a document is often incorrect. That means that it cannot be used to accurately gauge the freshness of a document.

It's a fairly short document, but it addresses some advice I've seen in some articles on the historical data patent application that recommend republishing pages on a site occassionally to make the search engines think that pages are new.
Offline Go to the top of the page

Star Member

Group: Members
Joined: 24-February 05
Posts: 517
post Jun 30 2005, 01:16 PM
Ladies and Gentleman, I give you: FRESHRANK.

Yes, I believe I understand some of the recent changes a little better now. While this patent may not be in effect, it does appear to explain why some pages that are updated frequently don't look fresh to Google, and why some pages do.

I would not recommend simply republishing pages. You can do the same thing with "touch" in UNIX. It doesn't seem to affect Google in the least.

They want to see updates to content, even if only minor ones.

Cloakers can have a field day with this.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jul 2 2005, 06:22 PM
It was really hard writing that post, Michael, and not using the phrase "freshrank." smile.gif

This patent application did help clarify some of the issues raised in the original application, with a little plainer language and explaination.
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 9-January 05
Posts: 1,532
From: Perth, Western Australia
post Jul 4 2005, 06:43 AM
I think its great.

Search Engines want to provide the latest and most up to date information.

There is a fair amount of material on the web that gets created in that exciting burst of energy at the beginning and then it goes cold. Its around 80%.

But there are some websites which are designed to stay the same.

The list includes

(a) Scientific formulas and standards.
(cool.gif Some Geographical & Historical based websites
© Government websites
(d) Business sections such as Terms & Conditions and Privacy Policy.

Does anyone update their privacy policy ? They probably copied it off someone else to begin with.

It would be interesting to see if they consider what people are updating as significant.

For example, if someone just sits around changing their backlink and internal link text, as opposed to someone who is updating a news item ?

Mate, keep up the list of new patents. They do provide a valuable insight into the way these companies think.
Offline Go to the top of the page

Star Member

Group: Members
Joined: 24-February 05
Posts: 517
post Jul 6 2005, 01:11 PM
There is a rumor that Google is planning one more major update this summer, and then they'll leave things alone for a while. If that turns out to be true, we should know by the end of the year how much FreshRank is being employed.

My hard-drive just crashed and as we bring the new one online, we're uploading old copies of static data and then updating them. We may get a boost in freshness.

Not the kind of test I prefer to run, but I'll mention something here if I see a sudden boost in Google rankings for some older content over the next few weeks.
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 4-September 02
Posts: 1,141
From: Europe
post Jul 7 2005, 02:45 AM
Think RSS feed + freshrank + affiliate spam = people looking elsewhere for their information.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 1-September 02
Posts: 9,213
From: UK
post Jul 7 2005, 04:49 AM
QUOTE(Travis)
But there are some websites which are designed to stay the same.  

The list includes  

(a) Scientific formulas and standards.  
(cool.gif Some Geographical & Historical based websites  
© Government websites  
(d) Business sections such as Terms & Conditions and Privacy Policy.

The interesting part about these are that all but the last will continue to gain fresh links and fresh citations from external sources. The last will almost certainly only gain fresh links from adding new pages to the site that the privacy policy applies to.

In other words, I really don't believe that it is all that important as to whether a page itself updates regularly, so much as guaging whether the page continues to be a currently relevant citation. The age of the links will be far more important to guaging that than whether or not the page itself has been updated (except in regards to spotting ye olde Bait-n-Switch technique).
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jul 7 2005, 06:56 AM
QUOTE
In other words, I really don't believe that it is all that important as to whether a page itself updates regularly, so much as guaging whether the page continues to be a currently relevant citation.


Excellent point, Ammon.

The historical data patent application that this updates mentioned that, and stated that while they are trying to determine freshness and staleness for pages, they recognize that some pages may be of a type where freshness isn't necessarily a positive.

In response to a query about this year's World Series champion, they want links to point to the newest team, and not one that is from a year or two ago. But, a historical document doesn't change over time, as is true of other type documents.

QUOTE
Mate, keep up the list of new patents. They do provide a valuable insight into the way these companies think.


More for you, from today:


Systems and methods for improving search quality

QUOTE
Abstract
Systems and methods are disclosed for improving search quality. Search queries are expanded using a variety of linguistic techniques. For example, the words in a query can be supplemented with related words obtained from a database of compound words, inflectional forms, and/or orthographic variations. The expanded queries can be used to perform searches for responsive documents. A document index can be expanded using similar techniques.




Systems and methods for direct navigation to specific portion of target document


QUOTE
Abstract
Systems and methods for direct navigation to and/or highlighting a specific portion of a target document such as query-relevant portion of the document are disclosed. The method may include generating a search result link to a search result document and generating an instruction to a client document browser to navigate directly to an intra-document portion related to the query within the search result document. The search result may include a snippet extracted from the search result document such that the instruction causes navigation directly to at least a portion of the snippet. The instruction may be an artificial anchor undefined in the search result document, e.g., designated by a preassigned artificial anchor designator. The client browser may have an artificial anchor module installed to execute the instruction to navigate directly to and optionally highlight the intra-document portion within the target document in response to the document link being selected.


Generating hyperlinks and anchor text in HTML and non-HTML documents

QUOTE
Abstract
Systems and methods for generation of hyperlinks and anchor text from data such as reference text in HTML and in non-HTML documents are disclosed. The method generally includes locating a text reference in a source document, searching using a search engine for a target document relating to the text reference, computing anchor text from the text reference, generating a hyperlink to the target document, and associating the hyperlink with the computed anchor text. The locating and/or computing may be based on a respective statistical model of text formatting and/or lexical cues. The text reference may be parsed into pieces such that the searching, computing, generating, and associating are performed for each piece of text. The source document may be an HTML or non-HTML document. The text reference may be a reference to, for example, a paper, article, company, institution, product, search engine, image, object, and geographical location.



Systems and methods for unification of search results

QUOTE
Abstract
Systems and methods for the unification of search results are described. In one described system, a program, such as a search engine, executing on a client device receives a search query. The search engine executes the search on a local index and receives a first result set, which is relevant to the query entered by the user. The search query is also executed against a global index. The search engine receives a second result set from the global index. Once the search engine has received both result sets, the search engine combines the result sets to create a combined result set. The search engine may cause the combined result set to be displayed or otherwise output to a user.


I haven't had time to read through those carefully, or to dig through the whole corpus of patent applications this morning, but they look interesting on the surface. There are a couple of interesting looking ones from Microsoft, too.

Personalization of web page search rankings

System and method for blending the results of a classifier and a search engine
Offline Go to the top of the page

Member

Group: Members
Joined: 18-October 04
Posts: 47
From: Australia
post Jul 8 2005, 09:04 AM
QUOTE(travis)

Does anyone update their privacy policy ? They probably copied it off someone else to begin with.


Haha
good call.
Sorry guys just found that amusing.....
Im not a very productive person am I?
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jul 8 2005, 12:49 PM
I did, too. smile.gif

I missed at least one more from Google yesterday morning.

Methods and systems for improving a search ranking using article information

Here's the abstract from that one:


QUOTE

Abstract

Systems and methods that improve client-side searching are described. In one aspect, a system and method for receiving a search query, determining a relevant article associated with the search query, and determining a ranking score for the relevant article based at least in part on client-side behavior data associated with the relevant article is described.
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 9-January 05
Posts: 1,532
From: Perth, Western Australia
post Jul 8 2005, 05:57 PM
Good point Ammon.

Its certainly something that some styles of sites have to worry about.

Some businesses have natural update periods, monthly, quarterly, yearly, and not at all.

I was just wondering how they would survive such an algorithm change with the onslaught of bloggers.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jul 10 2005, 07:32 PM
That's a good questions, Travis.

I believe I've been seeing Google spidering regularly updated blogs and news sites at least daily. I know that's enabled those sites to rank well for long tail terms, and for topics that are timely and popular. So in pratice, it's possibly not unusual for blogs to do well in rankings.

Ammon's list is a good one, but perhaps there is a benefit to commercial sites to include sections that are updated on a regular basis, such as press release areas, or even blogs.

Sure, privacy policies aren't updated on a regular basis, but they aren't often a part of a site where ranking in search engines is a priority to most businesses.

And pages that are great places to find historical documents and scientific forumulas are going to likely earn links from people because of the quality of the way they present that material.
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 06:07 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed