Jump to content

Leading Community for Usability, Search Engine Marketing,
Social Networking, Site Planning & Web Site Development, Since 1998


Photo

Google Says Internal Duplication Is Ok


22 replies to this topic

#1 GaryTheScubaDiver

GaryTheScubaDiver

    Unlurked Energy

  • Members
  • 3 posts

Posted 12 December 2006 - 02:46 AM

In an interview by Rand of SEOmoz with Vanessa Fox of Google she mentioned that there really isn't a penalty for having internal duplicate content issues (pages inside your own site that are copies of other pages on your domain).

Rand asked Matt Cutts,"Would you agree with that statement? Would you advise site owners to attempt to fix internal dup content issues, or is it really OK to let Google sort it out?"

Matt,"Vanessa was right - internal duplicate content isn't too big of an issue, and we're pretty good at sorting out which pages to rank."

Good to know!

GaryTheScubaGuy

#2 lee.n3o

lee.n3o

    Cre8asite Tech News Reporter

  • 1000 Post Club
  • 1556 posts

Posted 12 December 2006 - 03:32 AM

Although she said there was no 'penalty' (which I think we already knew from reading our SEO Myth thread) - what does worry me is they say leave it to us were pretty good at sorting out which pages to rank.... I think thats just asking for trouble!! How do they really know which pages to rank/index, they don't know your business as well as you do... Surely!

Basically... Just try to keep it to a minimum, why bother have lots of duplicate content ...... IMO its basically like flipping a coin - Just because theres no 'Penalty' doesn't mean go ahead and do it.... there are other repurcussuins ;-) ....

Edited by lee.n3o, 12 December 2006 - 03:33 AM.


#3 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 12 December 2006 - 03:43 AM

There is no (internal) duplicate content penalty :)

Come on, isn't that a bit old now? (When was that SEO Myths thread? Maybe we need a new version :)) Every site has almost unlimited duplicate content, you just have to get it indexed.

But there's no reason to let Google sort it out if you can do it yourself -- *you* know which pages you want to keep, make sure that the other ones are not linked. You wouldn't let the algorithm choose your design, why let it choose your URLs? :)

Another reason it makes sense to keep a grip on it is the potential influence on "pagerank" (not in a toolbar-pagerank sense, but in terms of "value"): If you have content duplicated over multiple URLs then those URLs could (and usually will) gain their own value - which dilutes the value of the URL you really want to use for that content.

John

#4 phaithful

phaithful

    Light Speed Member

  • Members
  • 800 posts

Posted 12 December 2006 - 03:55 AM

I think this is a case where they can't bash the algorithm... you'll notice that Matt does add

but it never hurts to help search engines with dupe content issues if it's easy to help, e.g. in the webmaster console, tell us if you prefer www vs. non-www.)


I've worked on quite a few sites, and many times the dupe problems aren't with the actual content... typically they have duplicate titles and meta data. That's were I see most of the "penalties" being applied.

Many sites out there don't have unique titles and / or descriptions for each of their pages. Many times it's just their company name that's the title on every page.

I've seen these sites have 1 or 2 pages in the main index and then all the other pages in the supplemental. A quick change to the titles and the meta data... and usually the pages make their way back into the main index.

Now, often times when we talk about the supplemental index... Google's company line is that it's not a "penalty" to be in there... but really... it is.... let's face it... most people won't ever get to the supplemental index...

As far as "sorting out pages" that are actual dupes of one another... I can agree that Google usually chooses one to rank... but remember... if you've got 1,000 links going to Dupe A and 300 to Dupe B.... Google will sort it out... but more likely those 300 links won't get added automagically to Dupe A without a 301.

Edited by phaithful, 12 December 2006 - 03:57 AM.


#5 travis

travis

    Sonic Boom Member

  • 1000 Post Club
  • 1532 posts

Posted 12 December 2006 - 04:00 AM

If its written on SEOmoz, its gotta be true.

This is exactly the sort of stuff I have been harping on about for the last week with SoftPlus.

Another way of seeing it would be to say "Once we see your duplicate content, we simply rank one version of it, not ten"'

There is no penalty, but there is no gain for doing it.

Internal duplication is more sinister for me. External duplication is difficult to administer.

#6 Ruud

Ruud

    Hall of Fame

  • Hall Of Fame
  • 4887 posts

Posted 12 December 2006 - 10:49 AM

There is no penalty doesn't mean there is no loss. As SoftPlus eloquently points out, this setup means that Google, and not you or your marketers, decide which page your customers will see for a specific product or service.

I think it pays to compose and setup your own specific page for a certain item and get people onto a conversion path of your choicing.

Good score for SEOmoz to have an interview like this. It's always a pleasure to read the site. Informed opinion, well researched facts. Refreshing food for thought in the world of mere conjecture :)

#7 randfish

randfish

    Hall of Fame

  • Members
  • 937 posts

Posted 12 December 2006 - 11:44 AM

I know this is going to sound weird coming from me, but...

I still think there are BIG problems related to having internal duplicate content. On the sites where we see them, indexing takes longer (probably since Google needs to sort out which version to index), ranking is lower (probably because different links point to different versions, splitting up link juice) and internal search doesn't work as well, either.

Despite what Matt & Vanessa had to say, I'm a big skeptic about just letting the engines sort it out. To me that's not SEO, it's SEL (search engine laziness)... :)

#8 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 12 December 2006 - 12:00 PM

I know what you mean Rand, depending on how the site is set up you can easily have upwards from 10-100 URLs for the same content (eg most forums, grrr!) -- and in order for Google to even recognize that it's duplicate content, it will have to crawl it first. That takes time and it uses up bandwidth and it reduces the "crawler priority" which is leftover for the other content (eg Google feels it wants to crawl 1000 URLs today, should it a. crawl 1000 URLs with unique content or b. crawl 1000 URLs with 10 different "pieces of content"?).

I'm certain that if you have enough "value" for your site to merit crawling the 100x duplicated content and still get it updated regularly, it will have a minimal influence and it certainly won't be a penalty. However, if you don't have that much value (and let's face it, most sites don't) then Google will end up crawling a part of the 100x duplicated content and perhaps miss out on the great stuff you really wanted indexed.

Let's face it: Google and most other search engines are very, very good at crawling technically deficient sites. However, they really excel when the site is technically correct. And ... after all, if you know that your site is not up to par (technically) what real reason is there to keep it that way? If you can fix it, if you can improve the usability for users or for bots, why not do it? Why should you not give the users and the bots a single URL for your content and work on keeping that URL valid for the lifetime of your content?

The same can be said for most other technical issues that webmasters face today. Even if a search engine can crawl a jumbled spaghetti puzzle of html code, wouldn't it still make sense to make their job easier and get the code to at least validate on a block level? After all, we want them to destill the essence of our content and direct people to it: why not make their job as easy as possible. To me, it's just common sense :).

John

#9 lee.n3o

lee.n3o

    Cre8asite Tech News Reporter

  • 1000 Post Club
  • 1556 posts

Posted 12 December 2006 - 12:00 PM

Despite what Matt & Vanessa had to say, I'm a big skeptic about just letting the engines sort it out. To me that's not SEO, it's SEL (search engine laziness)...


Exactly what I tried to say in the second post - but didn't put it as well!! Very good interview Rand... I thought you had good chemistry between you! Anything you want to tell us ;-)

#10 Halfdeck

Halfdeck

    Gravity Master Member

  • Members
  • 110 posts

Posted 12 December 2006 - 03:49 PM

I think in the end it really just depends on how well your site is faring. If you're raking in money on a daily basis from a domain, forget validation, duplicate urls, META tags, marketing, content quality. Money is coming in so who cares, right? I feel that way with some of my PPC campaigns. Marketing or quality is irrelevant if CTR and conversion is in the bag.

By the same token, when a domain is in the dumps, everything matters. Low Toolbar PR, lack of inbounds, lousy content, meta keywords, HTML to content percentage, keyword density, domain age, DMOZ listing...

Still, if SEO is your profession, I think not being able to write valid code or keep a tight .htaccess is downright unprofessional. Some may say validation is a waste of time, but if you know how to write clean code, then time isn't wasted. If you know how to spell, who needs a spell checker?

Edited by Halfdeck, 12 December 2006 - 03:50 PM.


#11 send2paul

send2paul

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2868 posts
  • Facebook:https://www.facebook.com/ThatBoyThere

Posted 12 December 2006 - 05:28 PM

By the way - GaryTheScubaDiver - (great name!) - hello and welcome to the Cre8asite forums and community :)

Paul

p.s. if you want to, come and say a few words about yourself in the Introduce Yourself forum, and maybe have chat in After Hours ?

Edited by send2paul, 12 December 2006 - 05:28 PM.


#12 Asia

Asia

    New To Community

  • Members
  • 1 posts

Posted 12 December 2006 - 09:38 PM

Over the past year, I've been watching duplicate content issues, and as Google says, they really don't have a problem with it, as long as they know which page to rank. However; duplicate content within a website shouldn't really be happening, unless you are really a**l about duplication.

Canonical issues: www or no www, some SEO's say it's duplicate content.
index.html/php/htm/asp etc.. whatever it ends in, it's the default page assigned by apache or ms - some SEO's say this is also duplication (huh?)

If Google or any search engine decided that these were in fact duplicate content, then every webmaster across the world should learn how to mess with apache and win documentation (ouch) that'll be ugly. Many web designers today simply hire on a private server, design websites and throw them up. They don't really understand the means behind it.

There are so many variations to what is duplicate content, but based on my observations over the past year, here are my findings:

Multiple sites with similar content will not be entirely banned from Google.
Only a few of those domains may rank or remain on the index.
Some domains will be banned, there's probably more than just duplication (Don't worry, Matt Cutt's will send your webmaster an email with reasons why.) see http://www.mattcutts...s-hacked-sites/

Great site and forums by the way :) I'll make it a regular stop

#13 send2paul

send2paul

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2868 posts
  • Facebook:https://www.facebook.com/ThatBoyThere

Posted 13 December 2006 - 12:24 AM

Great site and forums by the way :) I'll make it a regular stop

Thank you Asia - and welcome to Cre8asite.

What I like about debates like this are the useful snippets of info which pop up. For example, from phaithful:

Many sites out there don't have unique titles and / or descriptions for each of their pages. Many times it's just their company name that's the title on every page.
I've seen these sites have 1 or 2 pages in the main index and then all the other pages in the supplemental. A quick change to the titles and the meta data... and usually the pages make their way back into the main index.

It would be interesting to see if there was anywhere/anybody who has actually tested, (or can show examples of), this - and any of the other points raised in this debate.

Paul

#14 rmccarley

rmccarley

    Light Speed Member

  • Members
  • 642 posts

Posted 13 December 2006 - 02:34 AM

Paul that happened to 14th Colony last year. And a site I was working on at the time that I won't disclose. Donna has seen it happen too. Unique descriptions is a must if you choose to use the Meta description tag.

#15 Vanessa Fox

Vanessa Fox

    Unlurked Energy

  • Members
  • 5 posts

Posted 13 December 2006 - 05:52 PM

No worries about a penalty, but certainly, if you can tell us which version of a page you'd prefer us to index (using redirects, blocking with robots.txt, using the preferred domain feature in webmaster tools, etc.), all the better! We'll pick a URL if you don't give us input on which we should index, but we'll gladly take your input.

#16 Blumey

Blumey

    Unlurked Energy

  • Members
  • 4 posts

Posted 15 December 2006 - 03:54 PM

What about blog posts? The blogging software I use slices the blog posts any number of ways, including unique URLs for each blog post, a day view, a month view, a category view, and then a domain.com/blog view. This means there can be up to five places on my site where the content is featured ant any one time, which has often made me nervous from an SEO perspective. How do you remedy situations like this? Or does it really even matter?

#17 JohnMu

JohnMu

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 3518 posts

Posted 15 December 2006 - 04:14 PM

Hi Blumy
That could be a problem -- mostly because you never know which pages will get filtered out. If you can make that choice, you can set it up so that the bots will accept it like that as well. I used a shingle comparison tool and checked two types of blogs (full post previews vs snippet previews, few comments vs many comments) to see how they fare: (the full thread)

full posts, few comments: http://oy-oy.eu/page...JFTdj20Q#report
full posts, many comments: http://oy-oy.eu/page...5z1AgHHJ#report
snippets: http://oy-oy.eu/page...4CJqImxy#report
(can you recognize the blog posting that had the full post as a snippet?)

John

#18 rmccarley

rmccarley

    Light Speed Member

  • Members
  • 642 posts

Posted 15 December 2006 - 08:01 PM

Vanessa it's good to see you here but I have to wonder how great a job Google can do picking the right "version" when it still can't solve the canonical name problem on it's own. Best advice for SEOs: avoid duplicate content. If you can't do that, do your best to pick one page to outrank the others and then make sure it happens.

BTW the word "penalty" may be a misnomer here.

#19 Halfdeck

Halfdeck

    Gravity Master Member

  • Members
  • 110 posts

Posted 16 December 2006 - 04:36 AM

John, I'll have to check out that duplicate content tool when I have more time.

Vanessa it's good to see you here but I have to wonder how great a job Google can do picking the right "version" when it still can't solve the canonical name problem on it's own.


Yeah, when your site have links pointing to those pages that gets filtered out, then I see it as a problem even if Google discounts those links as equivalent to links pointing to 404s. Say if you have 10 links on a page, 1 outbound to wikipedia. Now Google decides one of those links point to a duplicate page and discounts it. So from Google's POV, you really only have 9 links. That means the amount of juice flowing out of that one outbound link is slightly greater than it should be. If your site is popular (its got thousands of IBLS), it may be nothing to worry about, but if its a new site with only a few inbounds or if its a 10,000+ page site with lots of duplicate issues, it could become a problem.

Keep in mind, Google's algo is a piece of code, and as with any piece of code, there's always going to be bugs and ways to break it when you do something unexpected by the coders.

#20 feedthebot

feedthebot

    Unlurked Energy

  • Members
  • 7 posts

Posted 16 December 2006 - 08:30 AM

Hello all,

I am working with someone right now that is in the process of removing duplicate titles as in...

"company name - company logo - general catergoy - actual page title here"

and replacing them with...

"actual page title"


If they are cool with it I will post some numbers, but I have definitely seen, over and over again, unique titles and descriptions revive a website.



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users