Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

How do they do that?


  • Please log in to reply
15 replies to this topic

#1 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 2002 posts

Posted 04 August 2006 - 02:59 PM

I've been meaning to ask this for some time now.

I look at my log files and check to see what pages the 3 main spiders look at. I understand when they do a full scan or when you can see them drilling down from the root or other page.

Most often it is pages here and pages there sort of at random.

Does anyone have an idea of how they decide what pages they want to see?

I know that Google does not know what you click after a search. Did a packet sniff.
Maybe all the URLs of the first page?
Maybe from the Googlebar (if installed) after a search?

#2 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 04 August 2006 - 03:34 PM

Yes, that's an interesting topic. ;)

I haven't really thought it all out, so here are just some random thoughts, no particular order:

- Google knows a bit about the clicks from the SERPs. If you're using Google Sitemaps, you'll see a small part of it - your rankings for specific keywords + which keywords brought you traffic. How do they get that? One is certainly personalized search (if you have it activated), another is probably the Googlebar (haven't tested it though). Other ways could be from combining Adsense information with search queries (user XYZ searched for this, a bit later he had Adsense from site ABC showing, hmmm). Also, don't forget Google Analytics :huh:

- I have no idea what the Googlebar tracks and sends out. If it gets the PR for all URLs I open, they'll have that at least.

- Known information about page change frequencies is certainly used -- combined with page "value" it can help determine the crawl frequency

- Known page change information could also be used to help determine "hub" pages on a site (news, sitemap, homepage, etc)

- page change frequency can only be reliably used if the page can be split into content / navigation / static sections; for that, it helps to have html code that is valid on a block level :(.

- Google Sitemaps allows the webmaster to specify Last Change Date, Change Frequency and "Priority" (whatever that is really for, I haven't figured it out, they say it isn't being used yet)


In the end, the main element I have seen for crawl frequency is page "value"; a page with good value is crawled more frequently than a page with little value (value being approx. proportional to the value of the links pointing to the site, with penalties and possible bonuses coming into play). Even a static high-value page is crawled frequently, it doesn't make that much sense to me, but there must be reasoning behind it. Perhaps the frequency would be even higher if the content were to change frequently?

Hope I gave you some ideas! I'm looking forward to some of the other ideas.

John

#3 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 2002 posts

Posted 05 August 2006 - 12:47 AM

Interesting about the Adsense thing.

I use neither adsense nor Google sitemaps.

In my case that leaves only googlebar. I guess the frequency of hits from the googlebar could give them something to go on. I imagine time spent on a page could be estimated when a user goes to another page. Sort of like: Did the user like that page or just more on quickly?

Is there an estimate of how many people use googlebar?

#4 SEOigloo

SEOigloo

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2100 posts

Posted 05 August 2006 - 12:56 AM

Hi Bobbb-
Just out of curiosity, why don't you use Google sitemaps? Is it just that you've not gotten into that, or do you have concerns about it?

#5 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 2002 posts

Posted 05 August 2006 - 10:13 AM

"Is it just that you've not gotten into that"
Correct.

This site I am refering to has 50 static pages and about 10 dynamic. G, Y, and M have seen them all. I do have a sitemap but not the type meant for spiders.

I've looked into it, downloaded a program. I could do it by hand too
.... but lacked the motivation because I did not see the benefit.

#6 Wit

Wit

    Sonic Boom Member

  • 1000 Post Club
  • 1599 posts

Posted 05 August 2006 - 01:07 PM

Some of us just don't want to get into G Sitemaps. There's something about them that's a bit fishy. :lol:

Regular sitemaps (you know - made for humans *and* bots) are quite alright though.....

Do you have inbound links to some of your inner pages (i.e. not the homepage)?

#7 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 2002 posts

Posted 05 August 2006 - 04:06 PM

Some of us just don't want to get into G Sitemaps. There's something about them that's a bit fishy.

Oh! I did not know. How fishy? Would like to know more.

Regular sitemaps (you know - made for humans *and* bots) are quite alright though.....

That's what I got. One for English and one for French.

Do you have inbound links to some of your inner pages

Not many

#8 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 2002 posts

Posted 10 August 2006 - 10:37 AM

Some of us just don't want to get into G Sitemaps. There's something about them that's a bit fishy



Is there anyone that can share some light on the statement. I was starting to get interested in sitemaps again.

Now if people here are saying that here, then there must some reason.

Or is G bashing taboo?

#9 Guest_joedolson_*

Guest_joedolson_*
  • Guests

Posted 10 August 2006 - 11:11 AM

No, Google bashing is fine. :)

Personally, I have no problems with using Google Sitemaps. However, I think that some people have concern that the use of Google Sitemaps is giving Google too much information about your site. Others have expressed concern that the use of Google Sitemaps may delay or harm the ranking of your site.

I don't think that any of these complaints hold a lot of water - the problem is that people think of Google Sitemaps as a tool intended to effect your ranking or indexing at all - which is really not the advantage of Sitemaps. Sitemaps give you access to a wide variety of information which Google keeps about your site - by using one and signing up with Google's Webmaster tools, you can see a panorama of Google's information about your site.

You don't give Google much which they didn't already know by creating a Sitemap. You're a) giving page URLs (they already have them) B) providing your update dates (they can figure that out) c) providing your idea of what pages are important (OK, that's new.) d) how frequently you're likely to change it (I guess that's new, although they could figure it out eventually.) e) Verification that you have server access to this site.

In return, you can view Google's view of your site - what pages they couldn't access, where they found missing files, where they were blocked by robots.txt, what top keywords were used to search for and find your sites, what pages in your site have the highest page rank, etc. It's my opinion that you're getting more than you're giving. It's also my opinion that Sitemaps will have no effect on your ranking at all - it's just not the point. They may make the indexing of your site more thorough, possibly, but most likely will have little effect in those regards.

I'd be very curious to hear exactly what's 'fishy' about Google Sitemaps, myself. Although I won't say that I 100% trust Google's motives all the time, I see little possibility for an abuse of trust with Sitemaps. You're providing little more information with them than a verification that you are able to access the site. Frankly, you haven't even demonstrated ownership of the site or even authorized access by verifying - you've merely demonstrated that you can place a file on the server.

#10 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 15 August 2006 - 02:02 PM

Hi Joe
You obviously haven't spent much time in the Google Groups for Sitemaps :D :D.

The "something fishy" IMHO about Google Sitemaps is that you are connecting yourself to a website and letting - directing - Google to crawl, parse and index it more at a higher level. Imagine that website is "spam" (or what have you): you're telling Google you're connected to it (+ connected to 50-100-1000 others) and you're telling Google to take a better look. I doubt you would want to do that, but then again, I doubt Google Sitemaps was made to track "spam" better :D.

However, there is one item I have found that is problematic about Google Sitemaps; I haven't been able to confirm it so take it with a pound of salt. It seems that Google "resets" the crawlers and parsers for your site when you add it to Google Sitemaps (perhaps it's just looking for the verification meta-tag?). If you have a clean, strong, well-supported site then that is usually no problem: you're up and running in no time. However, if the site is barely supported and has various HTML errors in the code (broken on a block level, etc.) then the crawlers need more time to get up to speed and the parser needs several "passes" to adjust to your site (get the right parser, optimal settings, etc). Many people have complained about something like that: dropping in ranking, dropping out of the index, indexed badly (bad snippet, bad keyword matching, etc). Removing the site from Google Sitemaps seems to make it rebound fairly quickly, keeping it in probably just takes time for everything to adjust properly.

So what does that mean? Before adding and using Google Sitemaps I would make sure that the site is technically correct and perfectly clean (also in regard of a possibly automated spam penalty - text links, affiliate links, etc.).

However, I disagree about the "you don't give Google more than it already has" :). How's this for neat: change a very low level page in your site, set the change date in sitemaps and have the crawler crawl THAT page within 2-10 hours? That's not just neat, that's amazing. That's what Google Sitemaps is for -- to spot updates in your site without having to crawl it all the time. It saves you bandwidth and gives you a very fresh listing in the search results. :naughty: :naughty:

I imagine "change frequency" will work similarly, though Google already has a good feeling for your page change frequncy anyway, so I'm not sure how they use it. "Priority" is something I personally feel was an "accident": once it is used (it currently isn't) it will just be misused, no matter which "priority" they use. I would ignore those settings and concentrate on keeping the sitemap file current (last change date).

John

#11 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 2002 posts

Posted 15 August 2006 - 05:03 PM

I have spent no time in the Google Groups for Sitemaps and don't understand ths part:

you are connecting yourself to a website and letting - directing - Google to crawl, parse and index it more at a higher level.


Almost sounds like you are giving them FTP access.

#12 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 15 August 2006 - 05:38 PM

Almost sounds like you are giving them FTP access.

Not quite, but misleading :D

What I meant was:

- you're giving Google the direct connection between your Google-Login and the websites in question. If you have other websites on the same Sitemaps User then Google will have a direct connection (with +/- proof through verification) that all those sites belong to the same webmaster (or are controlled by the same webmaster). If you have 100 sites in there and 1 is reasonably "questionable", they might assume the others are as well. (perhaps also for bans, penalties, etc.). However, if all your sites are legitimate then IMHO there is no problem here, and anyway - Google will probably be able to spot associations between sites fairly easily already.

- By submitting a Google Sitemap you're giving Google more information about your website - as it changes. Usually, that's what you want :). But say if you're "hiding something" (whatever trick you might want to use to trick Google into thinking your site is something that it isn't) then Google might recognize your attempt easier / faster.

Of course I don't want to assume that anyone is doing any of this :D -- but if you are, then you might want to stick to the tricks you know and go from there. :D

And of course I understand the possible privacy problems from the "known" associations between webmaster and website. But then again, if you want private information, you need to provide a channel to pass it along.

John

#13 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 2002 posts

Posted 15 August 2006 - 06:04 PM

OK now I understand.

I started this thread with this comment:

I've been meaning to ask this for some time now.


And I've been meaning to ask this for some time now. Everything in this (these) forums is white hat and non spammy.

I'd be willing to bet a 6 pak of Guinness that there are more black hats here than they are willing to admit.

Maybe that should be another subject??

Edited by bobbb, 15 August 2006 - 06:08 PM.


#14 Guest_joedolson_*

Guest_joedolson_*
  • Guests

Posted 15 August 2006 - 06:25 PM

You obviously haven't spent much time in the Google Groups for Sitemaps


Yeah, you're right...really don't spend much time (any time) in Google Groups at all.

I do see what you're getting at - and, I guess, as bobbb points out, from a black hat perspective there are definite concerns - my own perspective, being, of course, pure and innocent ( :) ) leaves me with a bit of a twisted perspective.

I certainly don't see any value in Google Sitemaps for a black hat marketer. Absolutely not. Given your comments about the resetting of parsers and crawlers for a newly added I can see some problems for webmasters with low-quality code, as well. However, that's no greater barrier than Google possibly sets (for low-quality code) already. Just a good thing to be aware of if you want to take advantage of Sitemaps benefits.

Personally, I haven't seen any different behaviors from having Sitemaps than I did previously - but then, I haven't been watching that closely, either.

you're giving Google the direct connection between your Google-Login and the websites in question. If you have other websites on the same Sitemaps User then Google will have a direct connection (with +/- proof through verification) that all those sites belong to the same webmaster (or are controlled by the same webmaster).


That's an interesting point - and that's what I'm just not sure of. You are indicating that you have a connection to these sites, I guess. However, I have to say that I'm not sure that having Sitemaps access is necessarily equivalent to having control over a site.

Nonetheless, I can see where you are handing one critical piece of information to Google - a unique identity which associates all the sites you're responsible for. (Probably.)

#15 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 16 August 2006 - 02:31 AM

I'd be willing to bet a 6 pak of Guinness that there are more black hats here than they are willing to admit.

:ph34r: :ph34r: :ph34r: :ph34r:

What's in a black hat? Checking your rankings could be seen as black hat, and I'm pretty certain that 1-2 (or more) here do that from time to time :D.

Realistically I think a good SEO should have a good grasp of all techniques out there, be able to apply them and know what kind of results they can bring. To me that includes, at least to a part, "black hat" techniques. When you work with a website and see that competitors are using some black hat techniques you need to be able to determine what kind of advantage (if any) they have from those black-hat items. Do they really have an advantage from that? What would you need to do to be able to top them with "clean", sustainable (I think that is very important) techniques?

For most areas I think that people seem to give black-hat too much weight. They're often not necessary and with a good, clean website you can often top them. Spending time on black-hat is time that is not spent elsewhere. What good is a black-hat website that ranks 1-2 places higher but can't convert visitors into customers?

The other item which is important to me is "sustainability": A site that knowingly uses black-hat techniques will have to be aware that their tricks will be found sooner or later - if not automatically, then at least by a competitor denouncing them. That will get them a penalty, a timed ban or at the very least force them to remove the black-hat tricks ASAP. If you're running a business and someone comes up and tells you that you need to change your website NOW then you're going to have a problem if you do not have a full time webmaster. And that's only assuming that you notice: if you don't notice, you might continue to run your shop with close to no search engine traffic, just leaving you wondering why you're not selling as much as you used to. A clean, white-hat site could also accidentally run into the same problem, but it's much less likely than if you're going against the rules on purpose.

John

Edited by softplus, 16 August 2006 - 02:31 AM.


#16 pleeker

pleeker

    Ready To Fly Member

  • Members
  • 17 posts

Posted 22 August 2006 - 04:16 PM

Jon Glick (formerly of Yahoo, now with Become.com) touched on this briefly during one of his presentations at SES San Jose. I wrote up a brief recap of his presentation on my blog, which includes this item:

Search engines keep a history of your site and track how often pages change. Repeated “meaningful changes” can increase the frequency your site is crawled.

I spoke with Jon later in the day and asked him to define a "meaningful change" -- not something he could do easily, but his main point was that it would need to be a change or update to the content that somehow alters the overall message of the page. So, things like a random testimonial or random photo won't help. But adding a new news blurb / article would.

He also explained that it's more than just authority that factors in -- i.e., a page listing the order of finish for the 2001 Boston Marathon might be authoritative for that subject, but the content isn't going to change often and so it won't be crawled frequently.

HTH!



RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users