Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Patent Application On Google Mobile Sitemaps


  • Please log in to reply
3 replies to this topic

#1 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 01 March 2007 - 12:24 PM

Google published a patent application this morning that provides the most detailed look into how the Google Sitemaps work that I've seen.

It was published in the context of Google's Mobile Sitemaps, but the majority of the "Detailed Description" of the patent application describes how their Sitemaps work in general, and how they look at sitemap information.

The document is:

Mobile sitemaps

The most interesting line in the document to me was this one:

In some embodiments, information from the sitemaps may be incorporated into the computation of the page importance score.



#2 incrediblehelp

incrediblehelp

    Ready To Fly Member

  • Members
  • 24 posts

Posted 01 March 2007 - 06:44 PM

To help you rank, hurt rank or both!!

#3 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 01 March 2007 - 07:01 PM

Great find, Bill. I've been a Sitemaps-Freak from the start. It's neat to see that some of my guesses were right :). Some of the interesting things are still a bit unknown (like when the priority attribute is actually used).

It's neat to see some local names on the patent :).

Can you tell me if I'm getting this section right (I can't see the images in my browser, grrr):

[0102] The list may also include document popularity information derived from the access logs. The document popularity information may be determined based on the numbers of accesses each non-error URL has. The document popularity information serves as an additional hint of which documents are to be given a higher priority during crawling (e.g., scheduled to be crawled first, or more likely to be crawled than lower priority documents), based on which documents are in high demand (i.e., are accessed more often).

Are they talking about the sitemap file as the "list" (ie webmaster-specified information) or is this an internal list (information extracted from Google's logs)? If the latter, would that not mean that Google is using its own collected click-data for crawler priority?

John

#4 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 01 March 2007 - 11:06 PM

It's fun to look at some of these patents where you know that a good part of them have been implemented. Not everything that they discuss always makes it in, and some stuff gets added as the processes in a patent are developed.

I saw the three developers from Switzerland. Nice international effort there.

The access logs that they are talking about are from the website where the sitemap is being generated. From earlier on in the patent application:

[0073] The sitemap generator 106 generates sitemaps by accessing one or more sources of document information. In some embodiments, the sources of document information include the file system 102, access logs, pre-made URL lists, and content management systems. The sitemap generator 106 may gather document information by simply accessing the website file system 102 and collecting information about any document found in the file system 102. For instance, the document information may be obtained from a directory structure that identifies all of the files in the file system, or in a defined portion of the file system.

[0074] The sitemap generator 106 may also gather document information by accessing the access logs (not shown) of the website. The access logs record accesses of documents by external computers. An access log may include the URLs of the accessed documents, identifiers of the computers accessing the documents, and the dates and times of the accesses. The sitemap generator 106 may also gather document information by accessing pre-made URL lists (not shown). The pre-made URL lists list URLs of documents that the website operator wishes to be crawled by web crawlers. The URL lists may be made by the website operator using the same format as that used for sitemaps, as described below


I'm not sure if using document popularity to help determine crawler priority makes sense. It's possible that the more popular pages are the ones that may be indexed by search engines already.

To help you rank, hurt rank or both!!


That's what I was thinking, too. :)



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users