Jump to content

Leading Community for Usability, Search Engine Marketing,
Social Networking, Site Planning & Web Site Development, Since 1998


Photo

Major Search Engines Agree On New Canonical Tag


36 replies to this topic

#1 iamlost

iamlost

    The Wind Master

  • Admin - Top Level
  • 3979 posts

Posted 12 February 2009 - 07:42 PM

The SEs have agreed on a new canonical tag to assist sites with certain duplicate content issues.

Google, Yahoo & Microsoft Unite On “Canonical Tag” To Reduct Duplicate Content Clutter, Vanessa Fox, searchengineland.

The web is full of duplicate content. Search engines try to index and display the original or “canonical” version. Searchers only want to see one version in results. And site owners worry that if search engines find multiple versions of a page, their link credit will be diluted and they’ll lose ranking.

Today, Google, Yahoo and Microsoft (links are to their separate announcements) have united to offer a way to reduce duplicate content clutter and make things easier for everyone.

...

Specify the canonical version using a tag in the head section of the page as follows:

<link rel="canonical" href="http://www.example.c...swedish-fish"/>

That’s it!

* You can only use the tag on pages within a single site (subdomains and subfolders are fine).
* You can use relative or absolute links, but the search engines recommend absolute links.

This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.

* Links to all URLs will be consolidated to the one specified as canonical.
* Search engines will consider this URL a “strong hint” as to the one to crawl and index.


Canonical URL links, Joost de Valk, yoast.
Joost includes links to Canonical plug-ins for WordPress, Magento, and Drupal.

Live Coverage of Ask The Search Engines at SMX West, Barry Schwartz/Keri Morgret, seroundtable

#2 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 12 February 2009 - 08:17 PM

This is very cool. I hope that it doesn't result in sloppy practices in the normal practice of canonicalization; since that still has other values outside of search engine optimization (like good logic, for example), but this certainly will make fixing canonicalization problems a lot easier!

#3 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 13014 posts
  • Twitter:https://twitter.com/kim_cre8pc
  • Facebook:https://www.facebook.com/cre8pc

Posted 12 February 2009 - 09:44 PM

So. Anyone know what would prevent spammers from using it for their splogs and other instances of ripped off original content?

#4 joedolson

joedolson

    Eyes Like Hawk Moderator

  • Technical Administrators
  • 2869 posts
  • Twitter:http://twitter.com/joedolson
  • Facebook:http://facebook.com/joedolson

Posted 12 February 2009 - 10:33 PM

I'm not sure what you mean - how would spammers cause a problem that way? The only relevant situation I can imagine is not particularly a problem...

If a spam blog stole content from Cre8tive Flow, for example, and used OUR canonical URL element, they'd have this on their spam blog:

<link rel="canonical" href="http://blog.cre8asite.net/archives/389" />

As a result, Google would trip by their site and say: Ooh, look - this site (http://spam.thisissp.../archives/40876) should really be indexed at http://blog.cre8asite.net/archives/389. Well, guess I'll de-index this page and grab the other one!

Even if this situation did end up in the 1% edge case where they didn't use the canonic suggestion, you wouldn't be any worse off than you were before.

#5 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 13014 posts
  • Twitter:https://twitter.com/kim_cre8pc
  • Facebook:https://www.facebook.com/cre8pc

Posted 12 February 2009 - 10:42 PM

I was thinking of those who reprint orig content and make it look like they wrote it. They don't reference the original in any way or if they do, don't link. I've seen my stuff in so many different forms I've lost track :)

Thanks for that info.

#6 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Admin - Top Level
  • 3286 posts
  • Twitter:http://twitter.com/DonnaFontenot
  • Facebook:http://www.facebook.com/donna.d.fontenot

Posted 12 February 2009 - 10:49 PM

Kim, I'm not sure I understand how this new tag would have any impact of those scraper sites. They already do it. How would adding a tag help them in any way? Not saying it won't...just saying I'm not understanding.

#7 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 13014 posts
  • Twitter:https://twitter.com/kim_cre8pc
  • Facebook:https://www.facebook.com/cre8pc

Posted 12 February 2009 - 11:02 PM

If I were a scraper site, I would want to claim to be original wouldn't I? That's what I was thinking they'd do.
But it could be me being paranoid and not trying it out first. :disco2:

Ah!

From the article:

Can this be abused by spammers? They might try, but Matt Cutts of Google told me that the same safeguards that prevent abuse by other methods (such as redirects) are in place here as well, and that Google reserves the right to take action on sites that are using the tag to manipulate search engines and violate search engine guidelines. For instance, this tag will only work with very similar or identical content, so you can’t use it to send all of the link value from the less important pages of your site to the more important ones.


Also, this seems to be more for ecommerce. I was hoping for a way to indicate original content like articles and blog posts. Am I right in that this solution is not for those situations?

#8 iamlost

iamlost

    The Wind Master

  • Admin - Top Level
  • 3979 posts

Posted 12 February 2009 - 11:26 PM

I was hoping for a way to indicate original content like articles and blog posts. Am I right in that this solution is not for those situations?

Right. Not meant for that.

Kim, this does not 'claim' content, rather it associates content within a particular site.

The common examples are ecommerce and analytics test pages with differing parameters but actually the same or almost the same content. This is a method of telling the SEs for example that
[ ./product.php?item=widget&colour=blue&size=12" ] and [ ./product.php?item=widget&colour=red&size=6" ] are not to be indexed as separate URLs rather that both should be seen as refering to [ ./product.php?item=widget ].

The SEs retain the right to ignore (and possibly to penallise) improper usage. It only applies within a domain itself, as a pseudo 301 within the site. It does not apply between sites as a 301 can. For example, while one can 301 myolddomain.tld/home.html to mynewdomain.tld/home.html one can not use <link rel="canonical" href="http://www.mynewdoma...ain.home.html"> to the head of myolddomain.tld/home.html as they are different domains.

Once again site owners are being asked to do some heavy lifting that has floored the SEs. :)

#9 send2paul

send2paul

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2868 posts
  • Facebook:https://www.facebook.com/ThatBoyThere

Posted 13 February 2009 - 01:57 AM

I just foresee more confusion. Everyone tagging everything to associate anything with their site.

Are Google really going to be able to police/control this properly when it comes to disputes over who owns what? I think time will tell.....

#10 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 13 February 2009 - 03:17 AM

I am confused! :frustration:

You mean, if Wikipedia does not add the tag quickly enough and I scrape their pages and add the 'canonical' my pages will be given preference?

I find the use of the word 'canonical' also confusing! The general definition of 'canonical' is a rule. In this concept, if I understand it properly '.. is original version'. On Scholar pdf results sometimes come out with version 1, version 2, version 3 and the like! In honest situations, it can provide some help to Google and the other engines, in abuse situations I am not too sure!

The only legitimate and useful application of this tag, can be within one's own domain. Is this how the SE's will utilize it?

Perhaps John can add some light? :)

Yannis

PS Like insurance, I will add the tag in the meantime, on new and old content! I will not use it for 'categories', summaries and the like!

Edited by yannis, 13 February 2009 - 03:48 AM.


#11 glyn

glyn

    Sonic Boom Member

  • 1000 Post Club
  • 1850 posts

Posted 13 February 2009 - 03:37 AM

Hi Kim,
I hear what you're saying, but everyone else is considering that a spammer would reference Cre8 as the source right, which we know not to be the case. I don't think a spammer could assume the trust of your site but some of those networks are really big aren't they!

Yes you are right and yes I can think of a great way to take advantage of this tag for grey purposes as you've already considered yourself, but I'm not gonna post it here.

What I think is interesting is that all the SE's came together on this. I'm hope to migrate 80% of my traffic away from Search engines by the end of the year, and I hope others follow suit.

G

#12 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 13 February 2009 - 06:24 AM

This "tag" (link element) is not really meant for cross-domain usage. It's not a way to claim ownership of content. It's just meant for all those situations where you run into duplicate URLs within your own site and it's hard to get them sorted with redirects.

Here are some examples where this could be used:
- Web-shops (mutliple URLs depending on how you got to a page)
- Sites that work with Session-IDs within the URL
- Ad-tracking URLs (eg using AdWords + Analytics)
- Affiliate tracking URLs
- News sites with multiple URLs per article
- Forums with multiple URLs per thread/page (eg "&highlight=", etc)

Keep in mind that while it will help to get the right URLs into the index, it will not prevent search engines from crawling the site. So if you have Session-IDs (or anything else that creates infinite URLs) on your site, that may still be a big problem in that we'll end up crawling and crawling and crawling in order to find your content (and even then, we may crawl one "page" with 100's of URLs while missing other "pages" completely). This does not mean that you're now free to keep sloppy URL structures :). Having a well crawlable site is still very important!

John

#13 DonnaFontenot

DonnaFontenot

    Peacekeeper Administrator

  • Admin - Top Level
  • 3286 posts
  • Twitter:http://twitter.com/DonnaFontenot
  • Facebook:http://www.facebook.com/donna.d.fontenot

Posted 13 February 2009 - 08:55 AM

Right, that's the point most are missing. This is only for "within your own domain" URLs. The easiest way to think of it is if you have several ways of getting to one page, and your page might have several urls (like page.php, page.php?view=print, page.php?refer=joe, page.php?refer=jane) but they are all the same page, you can just tell Google that they are all the same page by setting page.php as the main one. Now, if you have links pointing to each of those urls, that juice will get redirected to the main page, which is what you'd want.

#14 glyn

glyn

    Sonic Boom Member

  • 1000 Post Club
  • 1850 posts

Posted 13 February 2009 - 09:14 AM

The spammers are gonna have a field day.
Class of 2002

#15 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 13 February 2009 - 09:29 AM

The spammers are gonna have a field day.


Glyn, I think John's post made it almost clear that the tag is meant to be within your own domain.

He said:

This "tag" (link element) is not really meant for cross-domain usage...



Yannis

#16 glyn

glyn

    Sonic Boom Member

  • 1000 Post Club
  • 1850 posts

Posted 13 February 2009 - 09:38 AM

There is no misunderstanding here, trust me.

#17 DrPete

DrPete

    Mach 1 Member

  • Members
  • 327 posts

Posted 13 February 2009 - 10:11 AM

I think a Google engineer confirmed the point many have made here, that this is only an intra-site tag. I can't just go out and start claiming the whole internet as my own. Unfortunately, can't find the quote at the moment.

Honestly, I think this may have a big impact - it's a lot cleaner solution than using a complex network of nofollows or having to tell Google to block half your site. I deal with dupe content for an e-commerce site where I have to use something like a half-dozen different tactics. Page-based canonicalization may replace half or more of those over time.

#18 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 13014 posts
  • Twitter:https://twitter.com/kim_cre8pc
  • Facebook:https://www.facebook.com/cre8pc

Posted 13 February 2009 - 10:22 AM

Well. I suppose we can thank the SE's for creating more work for SEO's to do. Raise your fees everyone!

:cheers:

#19 eKstreme

eKstreme

    Hall of Fame

  • 1000 Post Club
  • 3399 posts

Posted 13 February 2009 - 10:48 AM

I'm afraid this will be misused and that it will create more confusion. I liked it better when the main message from the SEs was "fix your site to make it more crawlable and make your URLs cleaner". That line has two good side-effects: more user friendly URLs and a strong incentive to fix years of crappy CMS design. Content that is well structured is content that's easy to reflect with clean URLs.

This tag is basically accepting that a lot of crap CMSs are present, that they're not going to go away, and removes any incentive to fix them. Instead, it will have to be supported by CMSs which are already bloated. Where is the innovation here? It's a band-aid solution and a cop-out for lazy web developers who now have no incentive to think about designing good content structures and good websites.

That's not to say it doesn't have a few very specific uses (like consolidating aff links onto one page, but even that, with a good website/CMS design is solvable).

Color me unimpressed.

Pierre

#20 DrPete

DrPete

    Mach 1 Member

  • Members
  • 327 posts

Posted 13 February 2009 - 10:52 AM

I just left an extremely important comment over on SEOmoz about why I feel this is so important. The power to make anything canonical won't just reduce duplicate content, it will change the world as we know it*.

Disclaimer: This comment is not extremely important and rel="canonical" will not change the world as we know it. I'm aggressively avoiding writing a PowerPoint deck and have dedicated the entire morning to spending a lot of time on almost completely pointless comments on various blogs.



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users