Reply to this topicStart new topic
> Patent Application On Google Mobile Sitemaps, Nice insights into Google Sitemaps for Web, too

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Mar 1 2007, 12:24 PM
Google published a patent application this morning that provides the most detailed look into how the Google Sitemaps work that I've seen.

It was published in the context of Google's Mobile Sitemaps, but the majority of the "Detailed Description" of the patent application describes how their Sitemaps work in general, and how they look at sitemap information.

The document is:

Mobile sitemaps

The most interesting line in the document to me was this one:

QUOTE
In some embodiments, information from the sitemaps may be incorporated into the computation of the page importance score.
Offline Go to the top of the page

Member

Group: Members
Joined: 27-January 06
Posts: 24
post Mar 1 2007, 06:44 PM
To help you rank, hurt rank or both!!
Offline Go to the top of the page

Hall of Famer

Group Icon
Group: Hall Of Fame
Joined: 3-November 05
Posts: 3,461
From: CHeeseland
post Mar 1 2007, 07:01 PM
Great find, Bill. I've been a Sitemaps-Freak from the start. It's neat to see that some of my guesses were right smile.gif. Some of the interesting things are still a bit unknown (like when the priority attribute is actually used).

It's neat to see some local names on the patent smile.gif.

Can you tell me if I'm getting this section right (I can't see the images in my browser, grrr):
QUOTE
[0102] The list may also include document popularity information derived from the access logs. The document popularity information may be determined based on the numbers of accesses each non-error URL has. The document popularity information serves as an additional hint of which documents are to be given a higher priority during crawling (e.g., scheduled to be crawled first, or more likely to be crawled than lower priority documents), based on which documents are in high demand (i.e., are accessed more often).

Are they talking about the sitemap file as the "list" (ie webmaster-specified information) or is this an internal list (information extracted from Google's logs)? If the latter, would that not mean that Google is using its own collected click-data for crawler priority?

John
Online Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Mar 1 2007, 11:06 PM
It's fun to look at some of these patents where you know that a good part of them have been implemented. Not everything that they discuss always makes it in, and some stuff gets added as the processes in a patent are developed.

I saw the three developers from Switzerland. Nice international effort there.

The access logs that they are talking about are from the website where the sitemap is being generated. From earlier on in the patent application:

QUOTE
[0073] The sitemap generator 106 generates sitemaps by accessing one or more sources of document information. In some embodiments, the sources of document information include the file system 102, access logs, pre-made URL lists, and content management systems. The sitemap generator 106 may gather document information by simply accessing the website file system 102 and collecting information about any document found in the file system 102. For instance, the document information may be obtained from a directory structure that identifies all of the files in the file system, or in a defined portion of the file system.

[0074] The sitemap generator 106 may also gather document information by accessing the access logs (not shown) of the website. The access logs record accesses of documents by external computers. An access log may include the URLs of the accessed documents, identifiers of the computers accessing the documents, and the dates and times of the accesses. The sitemap generator 106 may also gather document information by accessing pre-made URL lists (not shown). The pre-made URL lists list URLs of documents that the website operator wishes to be crawled by web crawlers. The URL lists may be made by the website operator using the same format as that used for sitemaps, as described below


I'm not sure if using document popularity to help determine crawler priority makes sense. It's possible that the more popular pages are the ones that may be indexed by search engines already.

QUOTE
To help you rank, hurt rank or both!!


That's what I was thinking, too. smile.gif
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 05:44 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed