2 Pages V  1 2 >  
Reply to this topicStart new topic
> Google Site Maps

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jun 2 2005, 10:16 PM
The Google blog has some information on a new beta service from Google that might help get pages indexed by Google:

http://googleblog.blogspot.com/2005/06/web...r-friendly.html

Here's the page they point to there:

https://www.google.com/webmasters/sitemaps/login

Danny Sullivan questioned Shiva Shivakumar, technical lead on the project, on the subject:

http://blog.searchenginewatch.com/blog/050602-195224

(via SEO Chat)
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post Jun 2 2005, 10:28 PM
WOW!!! That is pretty cool. We'll see what affect this has. Should be interesting!
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jun 2 2005, 10:49 PM
It's going to make things interesting.

I suspect that I'm going to be trying this out sometime in the near future. smile.gif

If it makes it easier for sites that have a hard time being indexed to appear in Google, it's probably a good thing to try.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 11-February 04
Posts: 5,892
From: Los Angeles, CA
post Jun 2 2005, 11:19 PM
Looks fantastic, Bill. Thanks for sharing that.

I've got a fairly large number of dynamic pages, about 130,000 in all. Google has managed to index only about 40,000 of them and I can't seem to figure out why it gets stuck.

Looks pretty difficult, however, for the average mortal (like myself) to implement, but I'd be really interested to hear what others have to say after they've installed it. Looks promising, though.
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post Jun 3 2005, 01:26 AM
Garick, you got it a bit wrong.

There is nothing to install. It is a protocol designed to generate a list of pages you want search engines to index. One has to either build the list oneslef (annoying) or programatically generate pages to index, similar to a sitemap.

in simple terms, it is a protocol that says "Hey, here are all our pages we want indexed". You can also say othe things like "here are all our pages updated since {Insert day here}", "here are all our most valuabel pages" etc etc.

If you have a dynamic site, this will be utterly brilliant. Now, you can develop an overall sitemap, with EVERY SINGLE page on your site, as well as a "has changed" sitemap, updated on a frequent basis, i.e. a daily sitemap, weekly sitemap, monthly sitemap etc.

That is fantastic cause, if it works the way I think it eventually will, that will mean that the first pages G will grab will be the updated / new pages, followed by the rest of the site. For massive sites, news sites etc, that is a godsend, as you can let the engines know what content to grab first, making sure timely content is indexed quickly and accurately.

If you sell products that are realeased, like, say, music, books, computer parts etc, you can (hopefully and eventually) tell the engines the moment these products are live on your site, hopefully for quick indexing.

How cool will that be?

Some thoughts of what this will (again, HOPEFULLY) lead to:
1. Less bandwidth costs allround.
2. More complete indexing.
3. More timeliness, both because pages are found sooner and because, with less pages crawled, Google has more time to find new pages.
4. better relations. This is the first real initiative in making Webmasters SEs partner, and not some evil third party. That, IMHO, is fantastic, and a great, positive step in the right direction.

That is my 2 second run down. If you can't tell, me == excited++++ about this, and I can't wait to see the way this all develops smile.gif
Offline Go to the top of the page

Moderator/Blog Editor

Group Icon
Group: Site Admin
Joined: 18-January 05
Posts: 5,375
From: Olympia WA, USA
post Jun 3 2005, 01:44 AM
Hokay...

I'll bet that someone is about to ask this, and someone else is about to write instructions for the dense and determined. biggrin.gif

I see something about a Google python thingy being best. My server's cgi bin will run python. Other than how to spell python, I know nothing. How for-dummies wizard-like is Google's favored site map builder?

Sounds exciting.
They already have it on Google Guidelines For Webmasters.
QUOTE
When your site is ready:

    * Have other relevant sites link to yours.
    * Submit it to Google at http://www.google.com/addurl.html.
    * Submit a sitemap as part of our Google Sitemaps (Beta) project. Google Sitemaps uses your sitemap to learn about the structure of your site and to increase our coverage of your webpages.


Elizabeth
Offline Go to the top of the page

Star Member

Group Icon
Group: 1000 Post Club
Joined: 9-January 05
Posts: 1,532
From: Perth, Western Australia
post Jun 3 2005, 02:22 AM
Elizabeth,

Dense and Determined. I like your attitude.

Respree,

To generate a dynamic list, you can:
(a) Get XENU to crawl your site.
(cool.gif Copy the pages you can submit to a search engine into an XML file.
© In the XML file, run a macro with an editor to format the data into an XML format.
(d) Whack it on your front page.

With 130,000 pages, that may be too hard.

Here's a new market for Ruud. Develop an automated program that trawls your site and makes the XML file for you.

40,000 out of 130,000 eh ? Is that a normal ratio of trawled pages for a site of that size.

Big news if you can get a copy of the remaining 90,000 pages trawled, or at the very least, known about. I did not realise you were such an internet mogul....so thats why the latest photo of yourself is down at the Yacht Club.
Offline Go to the top of the page

Moderator/Blog Editor

Group Icon
Group: Site Admin
Joined: 18-January 05
Posts: 5,375
From: Olympia WA, USA
post Jun 3 2005, 03:04 AM
The Site Map Generator looks like it's supposed to do some of the work for you, like a feed??? How to use it will require:
a) a blunt object (lots of cut and paste??)
cool.gif instructions for the dense and determined 8)
c) paying someone to install it (methinks thousands of programming brethren are thinking happy scripty thoughts)

There's more than one way to skin a site map. (Groan! Couldn't resist.)


Elizabeth
Offline Go to the top of the page

Star Member

Group: Members
Joined: 15-April 03
Posts: 662
From: Cumbria, England (Land of scary avatars)
post Jun 3 2005, 06:22 AM
I think the sitemap generator does the translations of the URLs into the XML, but you still have to provide it with a list of URLs to include in the Google sitemap.

As Travis mentioned, there may be room for a more automated system (even a remote one) to crawl the site and update this file. Only problem could be that it still doesn't get behind things like search boxes, but even something that can crawl and give priorities based on directories would be useful.

Cre8asite Remote Google Sitemap Creator - only $19.99/month? wink-2.gif


It is however very nice of them providing the system under a Creative Commons licence. Originally thought that was a bit daft helping MSN, but then again, if you can get it more widely accepted it might help Google and make them look nice.

Also nice to see they've kept the format easy to parse which could be handy for all those small search engines with a single programmer.

Trev
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post Jun 3 2005, 06:30 AM
But all that is pointless Travis. Google already do a good job crawling. UNLESS it is automated, so that you have multiple sitemaps, don't bother.

That is my $0.02 anyway!!!
Offline Go to the top of the page

Star Member

Group: Members
Joined: 15-April 03
Posts: 662
From: Cumbria, England (Land of scary avatars)
post Jun 3 2005, 06:45 AM
The only real advantages I could forsee of having to do it manually would be that perhaps Google will take more notice of this kind of sitemap and the fact that you can assign the priority to the pages which you can't with a normal Googlebot crawl.

Not sure how you'd tell it what has been updated when if using Xenu or similar.


Thinking about priorities - do you think if you had a directory site and gave your top level categories a priority 1, second level a 8 and ones below that a 5 that it would encourage Googlebot and such to head deeper into your directory and then start spidering where the actual content is probably located?


Trev
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post Jun 3 2005, 07:39 PM
Firstly, all the links:

What is Google Sitemaps?
Sitemap Protocol
Sitemap Frequently Asked Questions
Danny Sullivan interview: &Google Sitemaps& Web Page Feed Program
Google Sitemaps - Login

IMHO, forget the priority stuff. That will most likely be ignored. The real gold is in the ability to set modified dates.

From https://www.google.com/webmasters/sitemaps/...sitemap_lastmod
QUOTE
lastmod 

Definition:
Optional. The time the URL was last modified. You should specify the timestamp using ISO 8601; for example, 2004-09-22T14:12:14+00:00. You can omit the time portion of the ISO 8601 format; for example, 2004-09-22 is also valid. This information allows crawlers to avoid recrawling documents that haven't changed.

Constraints
Value must be in ISO 8601 format.

Example
<lastmod>2005-02-21</lastmod>
or
<lastmod>2005-02-21T18:00:15+00:00</lastmod> 

Subtag of
url 

Content Format

Text

So, you can tell the SE when a page was last modified. Further, under https://www.google.com/webmasters/sitemaps/...ileRequirements,

QUOTE
The following example shows a Sitemap index in XML format. The Sitemap index lists two Sitemaps:
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<sitemapindex xmlns=\"http://www.google.com/schemas/sitemap/0.84\">
   <sitemap>
      <loc>http://www.mysite.com/sitemap1.xml.gz</loc>
      <lastmod>2004-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.mysite.com/sitemap2.xml.gz</loc>
      <lastmod>2005-01-01</lastmod>
   </sitemap>
</sitemapindex>

So, you can have multiple sitemaps, all with different lastmod dates. That means, AFAI can tell, that you have several sitemaps pointing out different information. A base sitemap (up to 50,000 URLs so you made need several) and then new URLs in new sitemaps.

With different lastmod dates, a site can effectively help the engines finding pages that are fresh and new, both by having a refrehed sitemap.xml, and in the individual URL commands.

That has always been the goal of SEO: to get all pages indexed and re-indexed as soon as they change. This new initiative has the potential to improve this area dramatically, hopefully to the point at which new pages get indexed within days of going live; even deep, hard to find the old fashioned way content.

To take advantage of this, one needs either:
1. A CMS that creates these sitemaps on the fly.
2. A programme to do this for you.
3. To add to your sitemap everytime you add a new page, remembering to change the relevant lastmod dates.

I can imagine an Apache Mod or programme is possible for static sites (just using the date a file was updated), but for dynamic CMSes, a module is really the only option.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jun 3 2005, 10:50 PM
Excellent post, projectphp.

Thanks for the links and the quotes.

I do think that this has the potential to help a lot of people get their sites indexed, and it does make it easier for the search engines to have people do some of this work.

Your point about automated programs helping people create and update these XML-based site maps is a good one. There's at least one out there now. Here's a couple of links to some helpful articles on the subject (including some php to create a feed for Word Press):

Google Sitemaps with Wordpress

Breaking Down Google Sitemaps XML
Offline Go to the top of the page

Untested

Group: Members
Joined: 13-May 05
Posts: 6
From: Hong Kong
post Jun 5 2005, 08:34 PM
Promises much but currently delivers little. When I saw this thread I created a sitemap for my site as per the directions. It is now registered at Google Sitemaps with status Submitted: 1 days (sic) ago; Downloaded: 8 hours ago; Status: OK. So far Google has updated the cache of my index page, but it hasn't updated anything else, as far as I can see, and some pages it still hasn't crawled.

Anyone getting better results?
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 3-February 03
Posts: 3,926
From: Sydney Australia
post Jun 5 2005, 10:15 PM
Hehehe. The service is beta. There are no guaranetees and it isn't in full swing yet. Give it some time wink-2.gif

What you have to remember as well is that this is not a way to get everything you want. It is a way to communicate with SEs (and specifically Google at the moment), to hopefully enable better indexing. Does it currently work? Probably not, and for most sites isn't worth it. It is really a good step, but it isn't your gateway to instant indexing smile.gif
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jun 5 2005, 10:53 PM
Great that you were able to get the file created so quickly, abalone.

It may take some time to see it have an impact. Chances are that they may wait until some level of sites use these site maps for them to alter the way their crawlers collect information.
Offline Go to the top of the page

Member

Group: Members
Joined: 18-October 04
Posts: 47
From: Australia
post Jun 29 2005, 07:52 AM
Mate!
This is some sweet goodies by google...
Sorry all, I was just really keen when i found it and just had to find a forum to post my excitedness.

Looks like i can finally get all the pages on my lyrics website indexed by google. Google currently only indexes the artists lists and not the lyrics lists. This little problem has been annoying me for a while now and it may soon be solved! joy to the world! hehea
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Jun 30 2005, 06:39 AM
That's excellent news, Korske.

Please let us know how successful you are at getting the lyrics pages indexed.
Offline Go to the top of the page

Solid Contributor

Group: Members
Joined: 18-October 04
Posts: 53
From: Norfolk, UK
post Jul 3 2005, 01:48 PM
For people like me who dont really know what they are doing.... and struggle along with help from others!

It does say you can just use a normal text file (as long as its UTF-8 encoded) but this way doesnt let you use all the extra features such as the last updated, etc...

But would be a quick way of getting somthing made, until you can sort out a proper XML sitemap.
Offline Go to the top of the page

Member

Group: Members
Joined: 18-October 04
Posts: 47
From: Australia
post Jul 8 2005, 08:31 AM
Ok... Ive had a few issues...
Ive used a program called End Sheet to crawl teh site and turn it into an XML thingo however it has an issue phasing past the second parameter. keeps saying that the "=" is a problem
<loc>http://www.*****.net/index.php?page=artists&do=no</loc>
Just wont go past there.
May be this text file way will work. Could someone help me out? or im I just in the dark for this one?

Thanks
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
2 Pages V  1 2 >
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 12:39 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed