Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Google Does 180 And Says Dont Use Pretty Urls


  • Please log in to reply
55 replies to this topic

#1 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 09:45 AM

Unbelievably, Google is recommending that we DO NOT rewrite ugly, long, dynamic URLs. They blab on about how we might do it wrong and make things harder for Google to index.

WHAT? Have they gone mad?

1. The blog post in which they recommend this has a ... wait for it ... pretty URL itself.

2. While I can understand that flubbing a rewrite can make things worse, since when does that mean we shouldn't do something. Google:" You're going to screw things up, so rather than doing that, just trust that we will make everything all right." Me: I don't think so!

I'm speechless. Simply speechless.

Google has jumped the shark.

Google engineers need to put down the wacky cigarettes and think about what they just wrote.

Mass confusion ensues.

#2 phaithful

phaithful

    Light Speed Member

  • Members
  • 800 posts

Posted 23 September 2008 - 10:04 AM

I was as surprised as you were.

Especially when I read:

if you're using URL rewriting ... to produce static-looking URLs from a dynamic site, you could be doing harm rather than good.


and the post also suggest that you convert your dynamic pages into flat HTML files, then point the URLs to the flat HTML copies: :scratchhead:

If you want to serve a static equivalent of your site, you might want to consider transforming the underlying content by serving a replacement which is truly static.


I think that's just strange. I'd probably screw up the conversion process more than I would writing a simple regular expression to handle the URL rewrite.

#3 saschaeh

saschaeh

    Time Traveler Member

  • 1000 Post Club
  • 1026 posts

Posted 23 September 2008 - 10:09 AM

1. The blog post in which they recommend this has a ... wait for it ... pretty URL itself.

lol

Does this mean keywords in your URL are totaly worthless for indexing?

#4 phaithful

phaithful

    Light Speed Member

  • Members
  • 800 posts

Posted 23 September 2008 - 10:12 AM

Well you probably could still have the keywords in the parameters and it would still get indexed accordingly. (e.g. abc.php?p=Canon+EOS)

But I've actually always used "pretty URLs" because it's much more user friendly when sending / sharing URLs.

#5 Guest_joedolson_*

Guest_joedolson_*
  • Guests

Posted 23 September 2008 - 10:39 AM

I think that what they're really trying to do is debunk the myth that Google can't handle parameterized URLs. Of course, instead, they seem to come off as saying "just don't bother, 'cuz you'll probably screw it up and we won't."

It's certainly not going to change anything for me - I'm not about to start using parameters when I can avoid it, if I can instead use pretty urls. Frankly, the fact that the URLs are user-friendly carries a lot more weight for me than any vague uneasiness Google has that I might not do it right!

That said, it is a practical warning that if you don't do pretty URLs right, you can cause yourself a world of hurt. You can make duplicate content problems worse (and harder to detect), you can end up with a site entirely built with the wrong HTTP codes, you might end up completely losing some pages or having badly faulty 404 errors.

But it's so much more worthwhile to show people what the potential problems are and how to avoid them than it is to tell them to just not bother.

I found this passage interesting:

* www.example.com/article/bin/answer.foo/en/3

Although we are able to process this URL correctly, we would still discourage you from using this rewrite as it is hard to maintain and needs to be updated as soon as a new parameter is added to the original dynamic URL. Failure to do this would again result in a static looking URL which is hiding parameters. So the best solution is often to keep your dynamic URLs as they are. Or, if you remove irrelevant parameters, bear in mind to leave the URL dynamic as the above example of a rewritten URL shows:


(Emphasis added.)

They seem to be assuming that you're putting in some poor planning of your URLs, obviously! They're just assuming that you're only rewriting individual static URLs, rather than implementing a global rewriting scheme which handles the diversity of URLs your system generates.

It's all well and good to encourage people to avoid poor programming practice, but NOT by telling them to just not bother trying.

Edited by joedolson, 23 September 2008 - 10:40 AM.


#6 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 10:49 AM

Exactly. Here's a repeat of what I wrote over on sphinn. I figure I can repeat it, because Google is so freaking smart that they'll figure out what's dupe and what's not. (oy)

Google, in its ever-powerful, know-everything wisdom, tells us little webmaster minions not to do all that "hard stuff" because "we'll probably do it wrong, and make matters worse". Instead, we should just "trust them" to get it all figured out. Oh, I'm sorry, I forgot that we are all idiots. Perhaps we should just all go back to the days before dynamic sites were possible, so we don't get any of that "hard stuff" wrong.

This is all so wrong on so many levels, it's just unbelievable.

I can't even speak without spitting. Oh, and wait, Google would now rather we give them duplicate content (cuz, ya know, they never get that wrong either as they recently informed us).

Oh, and wait again - That post itself had a rewritten url.

Oh, and wait wait wait - throw usability out the window too while you're at it.

I can't even believe this. They really need to rethink what they've said.

Sure, ok, Google, if you want to tell us that getting the rewrite wrong can be dangerous - fine. Tell us that. That makes sense. But the rest of it...NOT.



#7 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 13506 posts

Posted 23 September 2008 - 11:05 AM

This nearly made me faint:

If you transform your dynamic URL to make it look static you should be aware that we might not be able to interpret the information correctly in all cases. If you want to serve a static equivalent of your site, you might want to consider transforming the underlying content by serving a replacement which is truly static. One example would be to generate files for all the paths and make them accessible somewhere on your site.


Can you imagine what we'd go through here?

We purposely scramble inside post URLs so no juice is passed, to save our own butt. Forums urls are always long and god forbid we should have "accessible" versions somewhere for Google.

What really bothers me about the whole affair is that the focus is on keeping Google servers happy, with no regard or nod to the fact that there are other search engines to consider.

#8 iamlost

iamlost

    The Wind Master

  • Site Administrators
  • 4600 posts

Posted 23 September 2008 - 11:14 AM

Ah, more parse the Google data stream...

While static URLs might have a slight advantage in terms of clickthrough rates because users can easily read the urls, the decision to use database-driven websites does not imply a significant disadvantage in terms of indexing and ranking. Providing search engines with dynamic URLs should be favored over hiding parameters to make them look static.

Note the careful wording:
static URLs might have a slight advantage in terms of clickthrough rates
database-driven websites does not imply a significant disadvantage in terms of indexing and ranking

So there is likely a slight ctr advantage and some undefined advantage in indexing/ranking in using static URLs. No reason at all to continue serving static, static looking URLs, none at al... :)

Providing search engines with dynamic URLs should be favored over hiding parameters to make them look static.
This one reads to me as (1) G can not tell the difference when URLs are rewritten correctly, d'oh!, and (2) not being able to tell dynamic from static bothers them... :)

if you try to make your urls look static and in the process hide parameters which offer the Googlebot valuable information.
Ah, yes, the truth it is out...
you should give us the possibility to analyze your URL structure
And why would we want to do that?
Hiding your parameters keeps us from analyzing your URLs properly and we won't be able to recognize the parameters as such, which could cause a loss of valuable information.
And why would that be a loss to who of exactly what?

Yes, poor URL rewrites can be a problem. Lots of things done poorly can be problematic. But there are very valid reasons that URL rewrites are done and sometimes sharing with SE's can be counterproductive. Plus of course, there are other SEs than the almighty Google. Not everything on the web is done for the benefit of G.

This smells like the nofollow FUD.

<added>
Go, Kim, go!!! :) :cheerleader:

#9 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 23 September 2008 - 11:35 AM

I've repeatedly argued against "pretty URLs" for several years, and against pseudo-static URLs for most of a decade. With some pretty significant exceptions acknowledged.

Here are the facts, at least as I understand them.

Ten or twelve years ago (dates get fuzzy when you start getting old) no search engine would touch a dynamic URL. We had to go to a great deal of trouble to get anything that was database driven spidered at all, and I still have sites consisting of static HTML pages . . . that are dynamically created.

Google was the first search engine, as far as I know, to begin spidering URLs with a question mark in them. And it didn't do it very well. Transposed parameters would screw with it, and anything more complicated than two or three simple parameters would be ignored.

That's changed a LOT in the last few years, and I suspect Google's most recent advice is based on those changes. Google is much better at recognizing transposed, or even unnecessary, parameters these days, and it's not difficult to find pages in the SERPs with up to eight parameters indexed.

Here's what Google didn't say, however.

Handling transposed or unnecessary parameters comes with a cost; the page has to be spidered multiple times, with tests to determine what is transposed or unnecessary, ultimately deciding what returns unique content and what doesn't. Those tests cost you bandwidth and Google time, both of which are finite quantities not to be squandered. I believe Google decides when those test should be run -- as well as how many parameters it will accept -- based on its perceived importance of the page. Yep, the dreaded Page Rank qualifier.

Does that sound like I'm advocating rewriting problematic URLs?

I'm not, because in my opinion a rewrite at the server level is almost always a band-aid on an open wound at the application level.

Making an URL "pretty" should be done by the application, not the web server. If you have to pass too many parameters, or overly-complicated parameters, then the application is poorly written. It's really as simple as that. Without getting into a lot of programming philosophy (structured design stuff), we need to insist that our applications create human-readable, spider-understandable URLs that work. Letting the programmers off the hook with server rewrites is a very poor solution, in my opinion.

I also think that turning a dynamic URL into a seemingly static URL is, first, unnecessary, and second, potentially dangerous.

It's unnecessary, in my opinion, because (1) search engines don't need it, and (2) there's no data to suggest human visitors appreciate static over dynamic (simplicity of each being equal!).

It's potentially dangerous because most spiders (and especially Google) will request static pages at a substantially faster rate than they will request dynamic pages. They know that web servers like Apache are optimized to spew out HTML very quickly. They also know that dynamic pages, typically created by an interpreted script language like PHP or Perl, put a much heavier load on the server's resources. They don't want to crash your web site (or get you kicked off a shared server), but that's exactly what can happen when you essentially tell them to spider dynamic pages just as quickly as they spider static HTML pages.

However, here's an exception to that rule, and another tidbit Google didn't mention.

If you have a whole lot of dynamic pages, being spidered at relatively slower speeds, it's entirely possible Google won't be able to index everything. To a large extent -- and the most important extent -- that's again going to depend on Page Rank. If Google deems your pages important enough it WILL find the time to get them into the index. Still, if you have a lot of dynamic pages and not a lot of PR, you can potentially get more of those pages into the index by telling Google to go faster; i.e., by fooling the spider into thinking the pages are static. If the spider's page requests start failing, however, you've instead told Google to go away for a while, which usually won't result in more pages being indexed. It's a gamble.

Personally, I like Google's new advice to web masters. Clearly, though, I would have stated it a little differently.

1. Simplify your URLs at the application level, not at the server level. From a usability perspective, it does the same thing and has the added benefit of insuring your application is actually efficient, not just pretending to be.

2. Don't pretend your dynamic pages are static pages. Every single problem that a dynamic URL "might" create with a search engine can be better solved by increasing the importance of your pages.

#10 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 23 September 2008 - 12:09 PM

I believe the issue that Google is referring to, perhaps more than URL's with question marks is URLS than emanate from common patterns found in PHP and other Framework applications. Let me be specific as I am using one at the moment and it has been bothering me:

Typically you should have a dynamic URL as follows (ALA Cre8asite):

- website.com/forums/index.php?act=post&do=reply_post&

Framework applications will redirect as follows:

- website/forums/act/replypost/12/66939

i.e the main controller the index file is hidden via your .httacces, forums will be a PHP class, act will be a function within the class and the balance parameters for this function!

My worry - and perhaps Google's is that the googlebot algorithms might be confused that, forums, act etc are folders and they would try to spider them! When they try, they get re-directed, hence more bandwidth and time for the googlebots. (If they do not try they wouldn't know if the website is static or not!) My guess that this is the reason for their recommendation ... money! For a badly framework written application one can also end up with endless loops of redirection.

My recommendation if it ain't broken don't fix it. For example all my Drupal and Wordpress sites get spidered very well with clean urls. CakePHP and CodeIgniter (MVC) framework sites also appear to be working well in the SERPS. I will be uploading an MVC (CodeIgniter) application next month and I will know if their advice is good or not!

As a sideline John a few days ago warned to watch what you put onto a URL as googlebot can inadvertently delete info from your database!

Maybe in the end one should stick to standards and the RFC standard for URLS is as per crea8siteforums!


Yannis

#11 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 12:14 PM

Or maybe Google shouldn't be GUESSING at a site structure based upon what it sees in a URL. This is likely a cause of why so many webmasters scratch their heads wondering why Googlebot is trying to spider pages that don't exist.

Google...don't guess. If you see a link to a page, follow it (assuming it doesn't have the ridiculous nofollow tag slapped on it).

If no one has told you that the page exists, don't assume, just because you see what you believe is a folder structure, so therefore you think you should investigate further.

Don't guess, Google, and you probably will eliminate A HUGE CHUNK OF RESOURCES that you are currently using in your guess-work.

#12 Guest_joedolson_*

Guest_joedolson_*
  • Guests

Posted 23 September 2008 - 12:17 PM

One point which I think is worth pointing out is that Google is NOT using the term "pretty URLs" - they're using the term "static URLs" -- and there is a significant difference.

One of their examples is this:

www.example.com/article/bin/answer.foo/language/en/answer/3/sid/98971298178906/query/URL

This is not, by any standard I understand, a "pretty" URL. It does give the appearance of being static, but NOT pretty or search-engine friendly.

If the examples used were more like www.example.com/articles/horse-breeds/shetland-ponies/, that would be a very different situation. The example URLs are not actually user-friendly: would you be able to navigate up the directory stream in the above example? No. Those are not directories; they are merely parameters which have been rendered as if they were a directory structure --- this is a very different issue.

And this is what Ron's talking about, I think, when he says that the problem should be solved at the application level: the application should be able to produce data which not only LOOKS like it's organized into directories, but BEHAVES as if it is. Takes Wordpress URLs: if you define them to produce /category/postname, and navigate up a directory, you're where you expected: at the category page. This is effective URL design: and fits the ID of "pretty" URL.

Static != Pretty: static merely means that the information is presented as if it's part of a static page. Pretty means that the information contained within the URL is useful to the user and can allow them to better understand the structure of the document.

#13 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 12:27 PM

Which still illustrates my point about guessing based on url structure. Sounds to me like Google has gotten ITSELF into the mess it's in (in terms of resources used) if it's using url structure to guess at what to index. If it didn't do that, then why would it care what the url structure looked like? They don't need to navigate up or down the url tree. They shouldn't be basing their indexing upon what they think might be within that tree (imo). They should base their indexing upon links (within the site or in a sitemap). Remember links? If I link to it, it exists. If I don't, um, why assume it might just because you think the url structure maybe, might, kinda, sorta, indicate that there maybe, might, kinda, sorta be something out there.

Added: session ids are a whole 'nuther ball of wax, btw. If session ids are the problem (as they always have been), then ask webmasters to not include session ids in the url to avoid having a few of those floating around if maybe kinda sorta someone links to a few of them. Discuss session ids. Not everything else. If one thing is a problem, don't make it all a problem.

Edited by dazzlindonna, 23 September 2008 - 12:29 PM.


#14 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 23 September 2008 - 12:38 PM

They shouldn't be basing their indexing upon what they think might be within that tree (imo). They should base their indexing upon links (within the site or in a sitemap). Remember links? If I link to it, it exists. If I don't, um, why assume it might just because you think the url structure maybe, might, kinda, sorta, indicate that there maybe, might, kinda, sorta be something out there.


I absolutely agree with that statement!

@Joe

Joe you are right about Wordpress, being a rather simple CMS they have managed to structure the 'pretty' urls very well! But if you check in the admin sections where more background work is happening they stick to:

http://site.com/wp-a...ion=edit&post=6

Designs with MVC (Model View Controller) architecture do not allow easily to implement a Wordpress type solution. The above url at best could become:

http://site.com/inde...min/post/edit/6

The .php could then (according to the standard RFC specification indicate that what follows, is just data and could signal Google to use it at its discretion!)

However, Dazzlidonna's solution is so simple! Don't spider directories! Spider links!


Yannis

Edited by yannis, 23 September 2008 - 12:40 PM.


#15 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 23 September 2008 - 01:37 PM

Or maybe Google shouldn't be GUESSING at a site structure based upon what it sees in a URL.

Have you seen anything in your logs to suggest they are, Donna? I haven't.

... the application should be able to produce data which not only LOOKS like it's organized into directories, but BEHAVES as if it is.


Why, Joe? They're not directories, after all, so there's no reason I can see why they should either look like directories or behave like directories. They're parameters. Search engines understand parameters. And when they're just as simple as directories, I think human visitors can understand them, too. Indeed, I think it stand to reason that CTR isn't going to be better for /seo/, which is ambiguous at the least, to category=seo, which to me seems much more self-explanatory. At the very least, I've seen no studies to indicate otherwise.

If anything, I would ask why CATEGORY needs to be passed at all. Can't the database be designed in such a way that the post ID is all that's needed? Wouldn't it be more efficient and ultimately more user-friendly to list not just the one category passed into the program but all the categories where that post ID resides?

#16 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 13506 posts

Posted 23 September 2008 - 01:45 PM

Matt Cutts responded over at Sphinn:

in my opinion what this post says is "We do a solid job on sites with dynamic parameters, and lots of people make mistakes when they try to rewrite their urls to look static, so you might want to try the dynamic parameter route because that can work quite well."
In essence, it's Google saying "We'll come to webmasters and the natural way to write dynamic parameters rather than asking you to rewrite everything as static if you don't want to." So we're trying to come closer to webmasters, not wanting webmasters to necessarily move toward us. If you already have a site and it's doing well the way that it currently is--great. In that case, you probably don't need to change anything. But if you're starting a new site, it's worth considering staying with dynamic parameters instead of doing large amounts of rewrites (which some webmasters do in unusual ways that don't always work well in search engines). That's my take, at least.
It's not like either choice would get you penalized in Google; all of this is just advice to give more information to webmasters when they're making their choice of site architecture.



#17 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 01:55 PM

Have you seen anything in your logs to suggest they are, Donna? I haven't.


Yes, and there've been many posts throughout the years from many others to suggest the same.

I love this bit of Matt's reply, "That's my take, at least".

If Matt isn't COMPLETELY CERTAIN what they are saying, doesn't that mean it's confusing?

Look, I have no doubt that the bloggers in question meant well, but they screwed it all up in a big way. I believe they should retract the post and start over - this time being clear and doing it properly.

1. If you want to warn us of the dangers of bad url rewriting, do so.
2. If you want to remind us that session ids in urls can confuse you and cause problems, do so.
3. If you want to let us know that you've gotten better at handling dynamic urls, do so.

But for pete's sake, Google, don't make the situation more confusing by making us sound stupid, or making yourself sound as though you always get it right - because you oh so DON'T.

And please, don't make it sound as if we should pander to your needs, whilst ignoring our own needs, as well as that of our users. No matter how much money you have, Google, or how much share of the market you have, our web sites are still ours. Don't dictate how we should structure them. Don't even hint at that type of dictatorship - even if that's not what you "meant".

#18 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 23 September 2008 - 02:16 PM

Yes, and there've been many posts throughout the years from many others to suggest the same.


Can you be more specific, Donna? I've seen a lot of people complain that a page is being spidered that isn't linked somewhere (and I might believe 1 out of a 100 of them), but I've never explicitly seen anyone document an instance of Google trying to follow a directory tree. I'd love to see a link proving otherwise?

#19 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 02:26 PM

I seriously doubt there can be "proof" of that, just like we pretty much can't "prove" anything Google does or doesn't do. Even if it appears that Google is following a directory tree, it doesn't constitute proof. However, if it walks like a duck, and quacks like a duck...

I am not claiming that I know that they do this for a fact. It does appear as though they might, however. And if they are, they should stop. Those fake "guessed at" urls come from somewhere (and not links), so where are they coming from? How is Google guessing at them? My guess is url structure. I could be wrong. But I do "believe" that there are lots of pages spidered that aren't linked anywhere, because I've seen too much of it with my own eyes over the years. As Ripley would say, "believe it...or not". :)

#20 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 23 September 2008 - 02:42 PM

Assume for a moment that nothing has changed with the way we are crawling and indexing the web. Well, actually, you don't need to assume that, rest assured that nothing has significantly changed from last week to this week with regards to our crawling and indexing.

So why would we put out a post that we will place the SEO community into a red-alert mode?

Because we think that it's important that something which has been generally trusted to be valid is just not true. It's not true that ALL URLs should be rewritten and that ANY static-looking URL is better than a dynamic one.

Keep in mind that we have a goal: to organize the world's information and make it universally accessible and useful. If we leave myths like "all URLs must appear to be static" untouched, we'll just make it harder on us (because indexing them properly is a pain) and in return, harder for us to send YOU visitors who want to look at your site, to have a chance to be dazzled by your content ("you" = any random webmaster - nobody in this room :) ). It's not like we are telling you what you need to do from now on, it's more of us telling you (the community) that something which many have assumed to be true is just not true.

It does us no good to fight webmasters who believe in a myth like -- we'd rather open up and tell you where some of the common problems are (and trust me, this is a giant problem) so that we can work together on making the web a place where users can find your content faster and easier.

Let me catch up a bit and then perhaps go into some more detail later on - it's been a long day :)

John

PS Ron, perhaps Donna is thinking about our blog post on Crawling through HTML forms

#21 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 03:51 PM

I give. There's never anything to be gained to try and make Google see its own arrogance, which it believes is helpful. (No offense to you John, just how I feel. You can't help it. You've drunk the koolaid for too long. <grin>).

Y'all do what ya want. Me...I'm gonna stick with friendly urls despite what Google says.

Peace y'all.

#22 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 23 September 2008 - 03:52 PM

I seriously doubt there can be "proof" of that, just like we pretty much can't "prove" anything Google does or doesn't do. Even if it appears that Google is following a directory tree, it doesn't constitute proof.

Yea, I know what you mean, Donna; even Einstein's stuff is still a theory. :)

Nonetheless, even if the negative remains elusive, it shouldn't be all that hard to document the positive. Given a directory tree of /one/two/three/four/, there should be raw server logs of Googlebot also requesting /one/two/three/, /one/two/, and /one/, probably in short order and probably in that order. The spider has to leave tracks and someone somewhere has to have found them.

FWIW, programmers were faking directory structures a long time before there even was a Google. We did it back in the mid-Nineties because Alta Vista and the rest didn't spider dynamic pages at all. I suspect Google probably knows that?

Of course, it's really something of a side issue here anyway. True or false, it doesn't change anything in the dynamic versus static URL debate.

If we leave myths like "all URLs must appear to be static" untouched, we'll just make it harder on us (because indexing them properly is a pain) ...

John, are you saying that indexing static URLs properly is a pain? Or are you talking about disguised dynamic URLs?

Just to argue the other side of the fence for a minute, I suspect anyone is who is going to do a poor job of rewriting their URL's is probably going to do an equally poor job of managing the parameters in a dynamic URL. In either case, I have to guess that indexing them properly is always going to be a pain. :)

#23 bwelford

bwelford

    Peacekeeper Administrator

  • Site Administrators
  • 9006 posts

Posted 23 September 2008 - 04:11 PM

I sometimes think these computers go so darned fast that we humans have a problem keeping up with them. Unless you deliberately look for something, you may not spot a pattern in a vast array of data. Most of the data is never seen by a human eye.

I still think Donna's suggestion that Google should only spider URLs it knows exist and should not attempt to infer a directory structure is a super recommendation. It would be nice to know what Google thinks about that suggestion.

#24 SEOigloo

SEOigloo

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2100 posts

Posted 23 September 2008 - 04:31 PM

Hi John,
I've been trying to follow all of this, but not being a savvy programmer, I am trying to understand how to apply this to my daily work.

May I ask you a question, please?

When we built our own blog, we didn't know about the Wordpress 'friendly URL' plugin, so our URLs have remained as-is. We've always had a fine time being rapidly indexed by Google and so have never wanted to go to the bother to install the plugin.

However, since then, all blogs we have built for clients have utilized the WP plugin.

Based upon what Google is saying, have we made a mistake?

Is this:

www.solaswebdesign.net/wordpress/?p=320

preferable to this:

www.solaswebdesign.net/wordpress/Google-Is-Really-Cool

?

I feel very confused and would appreciate anything you can tell me.

Thanks, John, and it's very good of you to help us all understand this situation.
Miriam

Edited by SEOigloo, 23 September 2008 - 04:33 PM.


#25 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 23 September 2008 - 04:44 PM

Hi Ron
The problem that I often see is that it's not the programmer who sets up URL rewriting but rather some webmaster who heard that it is important. If you look at many of the common CMS out there, they use dynamic URLs by default and need to be "fixed" through mods and whatnot to appear to have static URLs. When the "search engine friendliness" in URLs is added by someone later on in the workflow (and by someone new), chances are that something will get implemented sub-optimally. So in the end, it's easier for us to crawl a site that uses dynamic URLs with many weird parameters (which we can learn to ignore) than it is to crawl a site that uses static-looking URLs which are inconsistent and/or contain irrelevant elements that appear to be important (but aren't).

Barry, the only way to know that a URL exists is to crawl it :). A link alone can't be a reason to believe that a URL exists -- and similarly, a URL without a link to it may still exist. But that's getting too much into philosophical discussions :)

Donna, I only drink espresso & water :).

John

#26 bwelford

bwelford

    Peacekeeper Administrator

  • Site Administrators
  • 9006 posts

Posted 23 September 2008 - 05:28 PM

Barry, the only way to know that a URL exists is to crawl it :). A link alone can't be a reason to believe that a URL exists -- and similarly, a URL without a link to it may still exist. But that's getting too much into philosophical discussions :)

I'm not sure that we have differing views here, John. I think Donna was suggesting, and I was supporting, the notion that your spiders should only follow links that have been found somewhere. Whether they lead to a URL, well the spider should certainly check that.

OTOH I don't believe that there should be a mechanism to infer what other URLs might exist on a domain that has already been found. The spiders should only follow links found on already indexed URLs. Perhaps that's the way it works already. I just wondered. :)

#27 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 23 September 2008 - 05:38 PM

Hi Barry
Other than the form crawling I mentioned earlier, I'm not aware of Googlebot trying to crawl URLs that it "made up" or guessed at. All of the "Googlebot is crawling random URLs" reports that I've followed up on had real links behind them (again, they could be from various sources like Sitemap files, RSS feeds, JavaScript or Flash as well as actual web pages). I've checked a lot of these reports because I don't like any loose ends either :) ...

Hi Miriam!
If your new URLs do not add irrelevant elements into the URL or change the 1:1 relationship between URL and content, then they should be fine. Generally speaking, WordPress does a fine job of getting it right.

Hope it helps!
John

#28 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 06:42 PM

So, if G isn't using the url structure to traverse for crawling purposes, then why does it care what's in the url? Why does it care if there's something "irrelevant" in it? As long as there's one url for one page ('1:1 relationship between URL and content") , what does it matter what the url is?

adlfkslfj/asdfljslf/asdlfjslfj/aldfjsfl - do you care that those are non-sensical? why? if not, then why are we having this discussion?

Not having more than one url per page makes sense. Session ids cause problems because they can be accidentally used in links, and therefore can cause gbot to see a gazillion different urls for the same page. Got that.

Don't use session ids in urls. Got it.

Don't have more than one url per page. Got it.

Same advice we've seen for years.

But...Irrelevant stuff in a url? What? Huh? Sorry, but that just makes no sense to me. Rewriting a url - even if that url contains "irrelevant" stuff - so what? If that url is the only url pointed at the page - shouldn't that suffice?

Something is off. Or maybe I am just crazy. That's always a strong possibility. :)

#29 bwelford

bwelford

    Peacekeeper Administrator

  • Site Administrators
  • 9006 posts

Posted 23 September 2008 - 07:02 PM

Since I believe what you believe, Donna, we're clearly not crazy. :)

It's when you're the only one holding an opinion that you may have self-doubt.

#30 projectphp

projectphp

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3935 posts

Posted 23 September 2008 - 07:06 PM

First off, let me say that I agree with Google's article, for the most part, because:
1. People make mistakes, and anything that increases complexity increases the chances of a mistake. I've read more thana few articles/posts, where someone got their .htaccess wrong, and caused random problems.
2. Any advice given to enough peoplle will be wrong for at least a few. IHow many websites, out of all the dynamic websites, are run by competent, SEO aware devs? < 50%, I'd hazard a guess at, which makes t5he advice correct for most sites. If Google Had have added a YMMV to the bottom, would that make it more understandable?
3. Too many people worry about this far too much. The advantage is trivial, and too many poor developers get it wrong or, worse IMHO, forget about it after a relaunch, and cause the issue all over again.

Seems to me that, with few benefits and LOTS of potential downside, most businesses shouldn't bother.

They're not directories, after all, so there's no reason I can see why they should either look like directories or behave like directories.

The only part of what you wrote that I can find fault with Ron, and only because robots.txt makes it easy to dissallow directories, and a psuedo directory structure makes it easy to dissallow whole chunks oif s site, where code sometimes won't. Of course, consistency is the key, and I really am talkling about the MVC, cakePHP-style URLs in which a URL is www.example.com/CONTROLLER/METHOD/, and making it only work that way makes it easy to dissallow sections.

There are other ways to skin a cat, but I just really think that URL structure is a great way to work, as it enforces a structure that is easy to work with.

Otherwise, I think SE friendly URLs are the most over hyped SEO element, outside metyakeyword editting, in the CMS space.

#31 EGOL

EGOL

    Professor

  • Hall Of Fame
  • 5405 posts

Posted 23 September 2008 - 07:25 PM

Just tossing one out.....

If Google knows that your posts are

egolsdomain.com/1.php
egolsdomain.com/2.php
egolsdomain.com/3.php
egolsdomain.com/4.php
...............................
egolsdomain.com/85625.php

Maybe they want to crawl them in sequence rather than by linkage. I know lots of blogs that lose track of a LOT of pages. heh

#32 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 23 September 2008 - 07:55 PM

Other than the form crawling I mentioned earlier, I'm not aware of Googlebot trying to crawl URLs that it "made up" or guessed at.


Thank you, John, for helping to disabuse yet another myth. :)

So, if G isn't using the url structure to traverse for crawling purposes, then why does it care what's in the url?

I would hazard to guess, Donna, that Google doesn't greatly care. If they did, they could simply stop crawling URLs they don't like. I think people who buy and sell links have a pretty good idea of what can happen when Google really cares about something? :)

Maybe it's just because I've been singing the same song for a long time, but I read the article as nothing more than a friendly warning. " ... the decision to use database-driven websites does not imply a significant disadvantage in terms of indexing and ranking. Providing search engines with dynamic URLs should be favored over hiding parameters to make them look static."

In other words, it ain't helping and there's several ways it just might hurt. Done right, they're useless. Done wrong, they're downright dangerous.

The only part of what you wrote that I can find fault with Ron, and only because robots.txt makes it easy to dissallow directories, and a psuedo directory structure makes it easy to dissallow whole chunks oif s site, where code sometimes won't.


That's a valid benefit, Michael. Personally, though, I don't think it outweighs the potential disadvantages. Especially when there are so many other ways to skin that cat.

Generally, however, I think we're on the same page. So-called SE friendly URLs are, indeed, a waste of time and resources.

#33 Respree

Respree

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 5901 posts

Posted 23 September 2008 - 08:48 PM

Maybe it's just because I've been singing the same song for a long time...


Is it this song, by chance?

<edit>I thought it'd be this Garrick: http://www.youtube.c...h?v=eBGIQ7ZuuiU - Michael</edit>

Edited by projectphp, 24 September 2008 - 12:21 AM.


#34 iamlost

iamlost

    The Wind Master

  • Site Administrators
  • 4600 posts

Posted 23 September 2008 - 09:41 PM

There are several arguments somewhat at cross purposes in this thread. I look at it quite simply - what works best for the converting visitor. As I have yet to find a bot that fails where optimised for humans I don't much consider them at all, unless to block 'em.

Two reasons for rewriting dynamic URLs (SEs aside):
* security: exposed query strings are an invitation to crackers for a front door attack. Further, as with static pages, the often exposed file extensions provide underlying application information. As do 'powered by whatever' et al.

* usability: Long and complex (indeed usually incomprehensible) URLs provide no navigational clues, no resource indicators, and no sense of function. Users have been known to 'guess' URLs to move about a site.

Planning how to rewrite your URL's is critical as you don't want to be continually changing links. But that criticallity is shared by pretty much every other webdev component.

The very best incentive to 'doing it right', beit business model, site architecture, content, or handling dynamic URLs, is to fail. Fail to convert traffic, fail to get traffic, fail to get indexed by SEs. As I think was already mentioned: by the time you know enough to screw up URL rewrites you are already so deep in it the stars shine bright at noon.


Or maybe Google shouldn't be GUESSING at a site structure based upon what it sees in a URL.

Have you seen anything in your logs to suggest they are, Donna? I haven't.

I have.
And not just Google.
What I have seen are SE bots backing up the URL tree.

#35 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Site Administrators
  • 3795 posts

Posted 23 September 2008 - 09:53 PM

Now, Ron, it can't be both ways. They can't "not care" and yet "care". They obviously care since they keep mentioning over and over that it causes problems to have "irrelevant" stuff in urls. Irrelevant? To what?

Ok, time to get serious I guess. Let's count 'em up so I can offer the "proof". Here's all the times they mention how much they care, cut and pasted:

From the original blogspot post:

1. let us handle the problem of detecting and avoiding problematic parameters.
2. please remove unnecessary parameters while maintaining a dynamic-looking URL.
3. We can crawl dynamic URLs and interpret the different parameters
4. hide parameters which offer the Googlebot valuable information
5. You may be able to remove some parameters which aren't essential for Googlebot
6. If you are not able to figure out which parameters to remove, we'd advise you to serve us all the parameters in your dynamic URL and our system will figure out which ones do not matter.
7. Hiding your parameters keeps us from analyzing your URLs properly and we won't be able to recognize the parameters as such, which could cause a loss of valuable information.
8. Does that mean I should avoid rewriting dynamic URLs at all?
That's our recommendation, unless your rewrites are limited to removing unnecessary parameters, or you are very diligent in removing all parameters that could cause problem
9. Feel free to serve us your standard dynamic URL and we will automatically find the parameters which are unnecessary.
10. Google will determine which parameters can be removed
11. you could remove uncessary parameters for your users
12. Be careful that you only remove parameters which do not matter
13. Not all of these parameters offer additional information
14. probably would not cause any problems as all irrelevant parameters are removed.
15. here's an example of a rewrite where all irrelevant parameters have been removed:
16. needs to be updated as soon as a new parameter is added to the original dynamic URL
17. result in a static looking URL which is hiding parameters.
18. if you remove irrelevant parameters

those were in the original post...more to follow from comments from john

19. @businessgeeks: I would only recommend using WordPress permalinks if you are certain that they do not introduce any irrelevant elements into the URL
20. @Abbas Heidari: If you can make sure that you are not introducing any irrelevant elements into the URL, then this may be ok
21. It has always been easier for search engines to take a URL apart based on parameters in the URL than based on a fixed path and file name
22. If you can minimize all irrelevant elements within a URL for us
23. @MBZ & DazzlinDonna: If you can make absolutely sure that your URL does not include any irrelevant elements then you may be rewriting URLs in a way that is fine.
24. a properly rewritten URL, that does not contain any irrelevant elements

and then finally, here in this thread

25. URLs which are inconsistent and/or contain irrelevant elements that appear to be important (but aren't).
26. If your new URLs do not add irrelevant elements into the URL

26 times (with the word "irrelevant" used 10 times). Now, I don't know about you, but I'd say that Google cares about the structure of the url based on the number of times they've brought up the subject just in that one post, the comments, and this thread. they are obviously using the information in the url to make judgements. they say so. (see items 3-7 above). so what the url says matters to them. they care. but why? adfsf/adfsdf/adsdf/lklj means what? i can write that as a url. it can point to a page. it means nothing, and yet google is going to attempt to interpret it as meaning something? why? what's the point? if they aren't using the url structure to navigate, then there's no reason that i can see that they would need to interpret that as meaning anything other than a pointer to a page.

They care. But I don't see why.


What I have seen are SE bots backing up the URL tree.


oh good, iamlost, you can join egol and i in the crazy farm. :)

Edited by dazzlindonna, 23 September 2008 - 09:53 PM.


#36 iamlost

iamlost

    The Wind Master

  • Site Administrators
  • 4600 posts

Posted 23 September 2008 - 10:09 PM

oh good, iamlost, you can join egol and i in the crazy farm.

At least I'll be in good company. :kicking: :) :disco2:

And I get first dibs on the rotten eggs when the Google Streetview Camera Car comes along...
:pieinface:

#37 mvandemar

mvandemar

    Ready To Fly Member

  • Members
  • 27 posts

Posted 23 September 2008 - 10:26 PM

John, I have a question... when you said this:

It's not true that ALL URLs should be rewritten and that ANY static-looking URL is better than a dynamic one.


You made a serious point of emphasizing the "all" and "any" in there... that really seems to be to only be worth emphasizing if the statements "Many urls should be rewritten" and "Often static looking urls are better than dynamic ones" are both true (and thus you are taking care that people not go overboard). However, this would be the opposite of what the post suggested.

You also said this:

When the "search engine friendliness" in URLs is added by someone later on in the workflow (and by someone new), chances are that something will get implemented sub-optimally.


While you didn't put a percentile to those "chances", the phrase "chances are" usually implies "usually", or most of the time (which is the opposite of my inference from your other quote). This would mean that you are saying that more often than not url rewrites cause problems. Would you really be comfortable coming right out and saying that most rewritten urls should not be rewritten, and will cause problems if they are? I know that it is almost taboo for Googlers to pin things down like this, but can you perhaps clarify which of the above is true?

Also, do you guys seriously mean to say that this:

/blog/?p=20999

is more information to Google than this?:

/blog/some-descriptive-keywords-about-my-post/

Thanks. :)

-Michael

#38 Ron Carnell

Ron Carnell

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 2062 posts

Posted 23 September 2008 - 11:36 PM

... exposed query strings are an invitation to crackers for a front door attack.

Agreed, iamlost, and as with Michael's post, that's certainly a valid point. But, again, it doesn't outweigh the disadvantages of doing so. Especially since a seemingly static version of a dynamic URL in no way hides the parameters. They simply become position dependent instead of using variable names. I don't see how that makes them any harder to hack?

Long and complex (indeed usually incomprehensible) URLs provide no navigational clues, no resource indicators, and no sense of function.

Again, iamlost, I completely agree. That's true whether the URL is static or dynamic, though. The lesson is to simplify both. Exactly what I've been preaching.

What I have seen are SE bots backing up the URL tree.


You should document that, iamlost. Especially since John has directly claimed otherwise.

(FWIW, I've seen hints that Yahoo! was more concerned with directory structure than any search engine should be. I'm not, however, going to let their possible mistakes dictate the way I plan my site architecture. If they didn't try to back up a URL tree, you can bet a user will. I plan accordingly. As I'm sure you do, too.)

Now, Ron, it can't be both ways. They can't "not care" and yet "care". They obviously care since they keep mentioning over and over that it causes problems to have "irrelevant" stuff in urls. Irrelevant? To what?

But, Donna, I've been saying exactly the same things. For years. Just in this thread, for example, I've already questioned whether a CATEGORY parameter in a WP blog is really relevant or necessary. Indeed, I still maintain that most people are rewriting URLs because the underlying application is broke. They're trying to fix an open wound with a band-aid. If you really want to talk about normalized database design, primary keys, and why unique identifiers should never carry information, I'm more than game to do so. I'm just not sure it's (excuse the overuse) relevant in this thread.

I can assure you, however, that I don't "care" what you do in the sense that I think you mean it. The advice I've been giving people isn't self-serving, and when push comes to shove, it's not going to be any skin off my nose what you or anyone else does. And I have a sneaky feeling that Google isn't going to stop crawling the web if no one pays any attention to that blog post. Personally, I think they were just trying to help webmaster spend their time more productively. I know that's all I've ever tried to do.

#39 fisicx

fisicx

    Sonic Boom Member

  • Hall Of Fame
  • 1884 posts

Posted 24 September 2008 - 05:05 AM

If I can make my rather measgre contribution to the debate. I agree with both John and Ron.

Content is king and as long as a link exist that leads to that content it matters not one jot whether it's static, dynamic or rewritten.

The SE will follow the link, index the target page and rank it according to the content not the URL.

That's all I believe google is trying to say: don't bother to do a rewrite 'cos it's makes no difference to us.

#40 Jem

Jem

    Mach 1 Member

  • Members
  • 366 posts

Posted 24 September 2008 - 06:18 AM

oh good, iamlost, you can join egol and i in the crazy farm. :)

I think I'll pop in to your crazy farm for a cuppa - because I have noticed what appears to be the same thing (Googlebot guessing URLs/directory structure)

In my webmaster tools, I regularly have a whole bunch of 404s listed to pages that 1) have never existed, 2) will never exist, and 3) have never been linked to. These are normally 404s because the URLs have had totally unrelated pages/query strings appended to the URL - the query string structure only having ever applied to pages in the root directory (or not at all). On top of this, I have lists of 404s where Google has tried to check a page that it has previously indexed in one directory, and now it's checking for the same file in another directory :)

This happened back when I was using NO "clean" / "pretty" / "static" URLs, it happened when I had my own rewrite code for my custom blog system, and it still happens now I'm using WordPress and its built in make-permalinks-nice code. I have a htaccess file full of 301 redirects pushing the googlebot in the right direction based on my guesses on where I think it's trying to go.

I have no idea what causes it, and I have tried testing various tweaks to the rewriting code to see if that helps, but it doesn't seem to (and like I said, this happened back when I used pure query string madness too).

However - I will say that if you have rewrites in place, the URL structure should make sense anyway. In my (not so) humble opinion, it defeats the point if you have blog.com/categories/blah that takes you to the blah category, but blog.com/categories gives you a 404. That just seems incredibly bad design practise to me.


Anyway, with regards to this Google blog post... I'm not paying too much attention. I will continue to do what I do because I prefer the nice looking URLs over query strings, and as a user I find them 1) easier to remember and 2) easier to dig down the URL structure (as per my point above about a URL structure that makes sense.)

Edited by Jem, 24 September 2008 - 06:19 AM.




RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users