Google Does 180 And Says Dont Use Pretty Urls
#1
Posted 23 September 2008 - 09:45 AM
WHAT? Have they gone mad?
1. The blog post in which they recommend this has a ... wait for it ... pretty URL itself.
2. While I can understand that flubbing a rewrite can make things worse, since when does that mean we shouldn't do something. Google:" You're going to screw things up, so rather than doing that, just trust that we will make everything all right." Me: I don't think so!
I'm speechless. Simply speechless.
Google has jumped the shark.
Google engineers need to put down the wacky cigarettes and think about what they just wrote.
Mass confusion ensues.
#2
Posted 23 September 2008 - 10:04 AM
Especially when I read:
if you're using URL rewriting ... to produce static-looking URLs from a dynamic site, you could be doing harm rather than good.
and the post also suggest that you convert your dynamic pages into flat HTML files, then point the URLs to the flat HTML copies: :scratchhead:
If you want to serve a static equivalent of your site, you might want to consider transforming the underlying content by serving a replacement which is truly static.
I think that's just strange. I'd probably screw up the conversion process more than I would writing a simple regular expression to handle the URL rewrite.
#5
Posted 23 September 2008 - 10:39 AM
It's certainly not going to change anything for me - I'm not about to start using parameters when I can avoid it, if I can instead use pretty urls. Frankly, the fact that the URLs are user-friendly carries a lot more weight for me than any vague uneasiness Google has that I might not do it right!
That said, it is a practical warning that if you don't do pretty URLs right, you can cause yourself a world of hurt. You can make duplicate content problems worse (and harder to detect), you can end up with a site entirely built with the wrong HTTP codes, you might end up completely losing some pages or having badly faulty 404 errors.
But it's so much more worthwhile to show people what the potential problems are and how to avoid them than it is to tell them to just not bother.
I found this passage interesting:
* www.example.com/article/bin/answer.foo/en/3
Although we are able to process this URL correctly, we would still discourage you from using this rewrite as it is hard to maintain and needs to be updated as soon as a new parameter is added to the original dynamic URL. Failure to do this would again result in a static looking URL which is hiding parameters. So the best solution is often to keep your dynamic URLs as they are. Or, if you remove irrelevant parameters, bear in mind to leave the URL dynamic as the above example of a rewritten URL shows:
(Emphasis added.)
They seem to be assuming that you're putting in some poor planning of your URLs, obviously! They're just assuming that you're only rewriting individual static URLs, rather than implementing a global rewriting scheme which handles the diversity of URLs your system generates.
It's all well and good to encourage people to avoid poor programming practice, but NOT by telling them to just not bother trying.
Edited by joedolson, 23 September 2008 - 10:40 AM.
#6
Posted 23 September 2008 - 10:49 AM
Google, in its ever-powerful, know-everything wisdom, tells us little webmaster minions not to do all that "hard stuff" because "we'll probably do it wrong, and make matters worse". Instead, we should just "trust them" to get it all figured out. Oh, I'm sorry, I forgot that we are all idiots. Perhaps we should just all go back to the days before dynamic sites were possible, so we don't get any of that "hard stuff" wrong.
This is all so wrong on so many levels, it's just unbelievable.
I can't even speak without spitting. Oh, and wait, Google would now rather we give them duplicate content (cuz, ya know, they never get that wrong either as they recently informed us).
Oh, and wait again - That post itself had a rewritten url.
Oh, and wait wait wait - throw usability out the window too while you're at it.
I can't even believe this. They really need to rethink what they've said.
Sure, ok, Google, if you want to tell us that getting the rewrite wrong can be dangerous - fine. Tell us that. That makes sense. But the rest of it...NOT.
#7
Posted 23 September 2008 - 11:05 AM
If you transform your dynamic URL to make it look static you should be aware that we might not be able to interpret the information correctly in all cases. If you want to serve a static equivalent of your site, you might want to consider transforming the underlying content by serving a replacement which is truly static. One example would be to generate files for all the paths and make them accessible somewhere on your site.
Can you imagine what we'd go through here?
We purposely scramble inside post URLs so no juice is passed, to save our own butt. Forums urls are always long and god forbid we should have "accessible" versions somewhere for Google.
What really bothers me about the whole affair is that the focus is on keeping Google servers happy, with no regard or nod to the fact that there are other search engines to consider.
#8
Posted 23 September 2008 - 11:14 AM
Note the careful wording:While static URLs might have a slight advantage in terms of clickthrough rates because users can easily read the urls, the decision to use database-driven websites does not imply a significant disadvantage in terms of indexing and ranking. Providing search engines with dynamic URLs should be favored over hiding parameters to make them look static.
static URLs might have a slight advantage in terms of clickthrough rates
database-driven websites does not imply a significant disadvantage in terms of indexing and ranking
So there is likely a slight ctr advantage and some undefined advantage in indexing/ranking in using static URLs. No reason at all to continue serving static, static looking URLs, none at al...
Providing search engines with dynamic URLs should be favored over hiding parameters to make them look static.
This one reads to me as (1) G can not tell the difference when URLs are rewritten correctly, d'oh!, and (2) not being able to tell dynamic from static bothers them...
if you try to make your urls look static and in the process hide parameters which offer the Googlebot valuable information.
Ah, yes, the truth it is out...
you should give us the possibility to analyze your URL structure
And why would we want to do that?
Hiding your parameters keeps us from analyzing your URLs properly and we won't be able to recognize the parameters as such, which could cause a loss of valuable information.
And why would that be a loss to who of exactly what?
Yes, poor URL rewrites can be a problem. Lots of things done poorly can be problematic. But there are very valid reasons that URL rewrites are done and sometimes sharing with SE's can be counterproductive. Plus of course, there are other SEs than the almighty Google. Not everything on the web is done for the benefit of G.
This smells like the nofollow FUD.
<added>
Go, Kim, go!!!
#9
Posted 23 September 2008 - 11:35 AM
Here are the facts, at least as I understand them.
Ten or twelve years ago (dates get fuzzy when you start getting old) no search engine would touch a dynamic URL. We had to go to a great deal of trouble to get anything that was database driven spidered at all, and I still have sites consisting of static HTML pages . . . that are dynamically created.
Google was the first search engine, as far as I know, to begin spidering URLs with a question mark in them. And it didn't do it very well. Transposed parameters would screw with it, and anything more complicated than two or three simple parameters would be ignored.
That's changed a LOT in the last few years, and I suspect Google's most recent advice is based on those changes. Google is much better at recognizing transposed, or even unnecessary, parameters these days, and it's not difficult to find pages in the SERPs with up to eight parameters indexed.
Here's what Google didn't say, however.
Handling transposed or unnecessary parameters comes with a cost; the page has to be spidered multiple times, with tests to determine what is transposed or unnecessary, ultimately deciding what returns unique content and what doesn't. Those tests cost you bandwidth and Google time, both of which are finite quantities not to be squandered. I believe Google decides when those test should be run -- as well as how many parameters it will accept -- based on its perceived importance of the page. Yep, the dreaded Page Rank qualifier.
Does that sound like I'm advocating rewriting problematic URLs?
I'm not, because in my opinion a rewrite at the server level is almost always a band-aid on an open wound at the application level.
Making an URL "pretty" should be done by the application, not the web server. If you have to pass too many parameters, or overly-complicated parameters, then the application is poorly written. It's really as simple as that. Without getting into a lot of programming philosophy (structured design stuff), we need to insist that our applications create human-readable, spider-understandable URLs that work. Letting the programmers off the hook with server rewrites is a very poor solution, in my opinion.
I also think that turning a dynamic URL into a seemingly static URL is, first, unnecessary, and second, potentially dangerous.
It's unnecessary, in my opinion, because (1) search engines don't need it, and (2) there's no data to suggest human visitors appreciate static over dynamic (simplicity of each being equal!).
It's potentially dangerous because most spiders (and especially Google) will request static pages at a substantially faster rate than they will request dynamic pages. They know that web servers like Apache are optimized to spew out HTML very quickly. They also know that dynamic pages, typically created by an interpreted script language like PHP or Perl, put a much heavier load on the server's resources. They don't want to crash your web site (or get you kicked off a shared server), but that's exactly what can happen when you essentially tell them to spider dynamic pages just as quickly as they spider static HTML pages.
However, here's an exception to that rule, and another tidbit Google didn't mention.
If you have a whole lot of dynamic pages, being spidered at relatively slower speeds, it's entirely possible Google won't be able to index everything. To a large extent -- and the most important extent -- that's again going to depend on Page Rank. If Google deems your pages important enough it WILL find the time to get them into the index. Still, if you have a lot of dynamic pages and not a lot of PR, you can potentially get more of those pages into the index by telling Google to go faster; i.e., by fooling the spider into thinking the pages are static. If the spider's page requests start failing, however, you've instead told Google to go away for a while, which usually won't result in more pages being indexed. It's a gamble.
Personally, I like Google's new advice to web masters. Clearly, though, I would have stated it a little differently.
1. Simplify your URLs at the application level, not at the server level. From a usability perspective, it does the same thing and has the added benefit of insuring your application is actually efficient, not just pretending to be.
2. Don't pretend your dynamic pages are static pages. Every single problem that a dynamic URL "might" create with a search engine can be better solved by increasing the importance of your pages.
#10
Posted 23 September 2008 - 12:09 PM
Typically you should have a dynamic URL as follows (ALA Cre8asite):
- website.com/forums/index.php?act=post&do=reply_post&
Framework applications will redirect as follows:
- website/forums/act/replypost/12/66939
i.e the main controller the index file is hidden via your .httacces, forums will be a PHP class, act will be a function within the class and the balance parameters for this function!
My worry - and perhaps Google's is that the googlebot algorithms might be confused that, forums, act etc are folders and they would try to spider them! When they try, they get re-directed, hence more bandwidth and time for the googlebots. (If they do not try they wouldn't know if the website is static or not!) My guess that this is the reason for their recommendation ... money! For a badly framework written application one can also end up with endless loops of redirection.
My recommendation if it ain't broken don't fix it. For example all my Drupal and Wordpress sites get spidered very well with clean urls. CakePHP and CodeIgniter (MVC) framework sites also appear to be working well in the SERPS. I will be uploading an MVC (CodeIgniter) application next month and I will know if their advice is good or not!
As a sideline John a few days ago warned to watch what you put onto a URL as googlebot can inadvertently delete info from your database!
Maybe in the end one should stick to standards and the RFC standard for URLS is as per crea8siteforums!
Yannis
#11
Posted 23 September 2008 - 12:14 PM
Google...don't guess. If you see a link to a page, follow it (assuming it doesn't have the ridiculous nofollow tag slapped on it).
If no one has told you that the page exists, don't assume, just because you see what you believe is a folder structure, so therefore you think you should investigate further.
Don't guess, Google, and you probably will eliminate A HUGE CHUNK OF RESOURCES that you are currently using in your guess-work.
#12
Posted 23 September 2008 - 12:17 PM
One of their examples is this:
www.example.com/article/bin/answer.foo/language/en/answer/3/sid/98971298178906/query/URL
This is not, by any standard I understand, a "pretty" URL. It does give the appearance of being static, but NOT pretty or search-engine friendly.
If the examples used were more like www.example.com/articles/horse-breeds/shetland-ponies/, that would be a very different situation. The example URLs are not actually user-friendly: would you be able to navigate up the directory stream in the above example? No. Those are not directories; they are merely parameters which have been rendered as if they were a directory structure --- this is a very different issue.
And this is what Ron's talking about, I think, when he says that the problem should be solved at the application level: the application should be able to produce data which not only LOOKS like it's organized into directories, but BEHAVES as if it is. Takes Wordpress URLs: if you define them to produce /category/postname, and navigate up a directory, you're where you expected: at the category page. This is effective URL design: and fits the ID of "pretty" URL.
Static != Pretty: static merely means that the information is presented as if it's part of a static page. Pretty means that the information contained within the URL is useful to the user and can allow them to better understand the structure of the document.
#13
Posted 23 September 2008 - 12:27 PM
Added: session ids are a whole 'nuther ball of wax, btw. If session ids are the problem (as they always have been), then ask webmasters to not include session ids in the url to avoid having a few of those floating around if maybe kinda sorta someone links to a few of them. Discuss session ids. Not everything else. If one thing is a problem, don't make it all a problem.
Edited by dazzlindonna, 23 September 2008 - 12:29 PM.
#14
Posted 23 September 2008 - 12:38 PM
They shouldn't be basing their indexing upon what they think might be within that tree (imo). They should base their indexing upon links (within the site or in a sitemap). Remember links? If I link to it, it exists. If I don't, um, why assume it might just because you think the url structure maybe, might, kinda, sorta, indicate that there maybe, might, kinda, sorta be something out there.
I absolutely agree with that statement!
@Joe
Joe you are right about Wordpress, being a rather simple CMS they have managed to structure the 'pretty' urls very well! But if you check in the admin sections where more background work is happening they stick to:
http://site.com/wp-a...ion=edit&post=6
Designs with MVC (Model View Controller) architecture do not allow easily to implement a Wordpress type solution. The above url at best could become:
http://site.com/inde...min/post/edit/6
The .php could then (according to the standard RFC specification indicate that what follows, is just data and could signal Google to use it at its discretion!)
However, Dazzlidonna's solution is so simple! Don't spider directories! Spider links!
Yannis
Edited by yannis, 23 September 2008 - 12:40 PM.
#15
Posted 23 September 2008 - 01:37 PM
Have you seen anything in your logs to suggest they are, Donna? I haven't.Or maybe Google shouldn't be GUESSING at a site structure based upon what it sees in a URL.
... the application should be able to produce data which not only LOOKS like it's organized into directories, but BEHAVES as if it is.
Why, Joe? They're not directories, after all, so there's no reason I can see why they should either look like directories or behave like directories. They're parameters. Search engines understand parameters. And when they're just as simple as directories, I think human visitors can understand them, too. Indeed, I think it stand to reason that CTR isn't going to be better for /seo/, which is ambiguous at the least, to category=seo, which to me seems much more self-explanatory. At the very least, I've seen no studies to indicate otherwise.
If anything, I would ask why CATEGORY needs to be passed at all. Can't the database be designed in such a way that the post ID is all that's needed? Wouldn't it be more efficient and ultimately more user-friendly to list not just the one category passed into the program but all the categories where that post ID resides?
#16
Posted 23 September 2008 - 01:45 PM
in my opinion what this post says is "We do a solid job on sites with dynamic parameters, and lots of people make mistakes when they try to rewrite their urls to look static, so you might want to try the dynamic parameter route because that can work quite well."
In essence, it's Google saying "We'll come to webmasters and the natural way to write dynamic parameters rather than asking you to rewrite everything as static if you don't want to." So we're trying to come closer to webmasters, not wanting webmasters to necessarily move toward us. If you already have a site and it's doing well the way that it currently is--great. In that case, you probably don't need to change anything. But if you're starting a new site, it's worth considering staying with dynamic parameters instead of doing large amounts of rewrites (which some webmasters do in unusual ways that don't always work well in search engines). That's my take, at least.
It's not like either choice would get you penalized in Google; all of this is just advice to give more information to webmasters when they're making their choice of site architecture.
#17
Posted 23 September 2008 - 01:55 PM
Have you seen anything in your logs to suggest they are, Donna? I haven't.
Yes, and there've been many posts throughout the years from many others to suggest the same.
I love this bit of Matt's reply, "That's my take, at least".
If Matt isn't COMPLETELY CERTAIN what they are saying, doesn't that mean it's confusing?
Look, I have no doubt that the bloggers in question meant well, but they screwed it all up in a big way. I believe they should retract the post and start over - this time being clear and doing it properly.
1. If you want to warn us of the dangers of bad url rewriting, do so.
2. If you want to remind us that session ids in urls can confuse you and cause problems, do so.
3. If you want to let us know that you've gotten better at handling dynamic urls, do so.
But for pete's sake, Google, don't make the situation more confusing by making us sound stupid, or making yourself sound as though you always get it right - because you oh so DON'T.
And please, don't make it sound as if we should pander to your needs, whilst ignoring our own needs, as well as that of our users. No matter how much money you have, Google, or how much share of the market you have, our web sites are still ours. Don't dictate how we should structure them. Don't even hint at that type of dictatorship - even if that's not what you "meant".
#18
Posted 23 September 2008 - 02:16 PM
Yes, and there've been many posts throughout the years from many others to suggest the same.
Can you be more specific, Donna? I've seen a lot of people complain that a page is being spidered that isn't linked somewhere (and I might believe 1 out of a 100 of them), but I've never explicitly seen anyone document an instance of Google trying to follow a directory tree. I'd love to see a link proving otherwise?
#19
Posted 23 September 2008 - 02:26 PM
I am not claiming that I know that they do this for a fact. It does appear as though they might, however. And if they are, they should stop. Those fake "guessed at" urls come from somewhere (and not links), so where are they coming from? How is Google guessing at them? My guess is url structure. I could be wrong. But I do "believe" that there are lots of pages spidered that aren't linked anywhere, because I've seen too much of it with my own eyes over the years. As Ripley would say, "believe it...or not".
#20
Posted 23 September 2008 - 02:42 PM
So why would we put out a post that we will place the SEO community into a red-alert mode?
Because we think that it's important that something which has been generally trusted to be valid is just not true. It's not true that ALL URLs should be rewritten and that ANY static-looking URL is better than a dynamic one.
Keep in mind that we have a goal: to organize the world's information and make it universally accessible and useful. If we leave myths like "all URLs must appear to be static" untouched, we'll just make it harder on us (because indexing them properly is a pain) and in return, harder for us to send YOU visitors who want to look at your site, to have a chance to be dazzled by your content ("you" = any random webmaster - nobody in this room
It does us no good to fight webmasters who believe in a myth like -- we'd rather open up and tell you where some of the common problems are (and trust me, this is a giant problem) so that we can work together on making the web a place where users can find your content faster and easier.
Let me catch up a bit and then perhaps go into some more detail later on - it's been a long day
John
PS Ron, perhaps Donna is thinking about our blog post on Crawling through HTML forms
Reply to this topic

0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users






