Google Says Internal Duplication Is Ok
#1
Posted 12 December 2006 - 02:46 AM
Rand asked Matt Cutts,"Would you agree with that statement? Would you advise site owners to attempt to fix internal dup content issues, or is it really OK to let Google sort it out?"
Matt,"Vanessa was right - internal duplicate content isn't too big of an issue, and we're pretty good at sorting out which pages to rank."
Good to know!
GaryTheScubaGuy
#2
Posted 12 December 2006 - 03:32 AM
Basically... Just try to keep it to a minimum, why bother have lots of duplicate content ...... IMO its basically like flipping a coin - Just because theres no 'Penalty' doesn't mean go ahead and do it.... there are other repurcussuins ;-) ....
Edited by lee.n3o, 12 December 2006 - 03:33 AM.
#3
Posted 12 December 2006 - 03:43 AM
Come on, isn't that a bit old now? (When was that SEO Myths thread? Maybe we need a new version
But there's no reason to let Google sort it out if you can do it yourself -- *you* know which pages you want to keep, make sure that the other ones are not linked. You wouldn't let the algorithm choose your design, why let it choose your URLs?
Another reason it makes sense to keep a grip on it is the potential influence on "pagerank" (not in a toolbar-pagerank sense, but in terms of "value"): If you have content duplicated over multiple URLs then those URLs could (and usually will) gain their own value - which dilutes the value of the URL you really want to use for that content.
John
#4
Posted 12 December 2006 - 03:55 AM
but it never hurts to help search engines with dupe content issues if it's easy to help, e.g. in the webmaster console, tell us if you prefer www vs. non-www.)
I've worked on quite a few sites, and many times the dupe problems aren't with the actual content... typically they have duplicate titles and meta data. That's were I see most of the "penalties" being applied.
Many sites out there don't have unique titles and / or descriptions for each of their pages. Many times it's just their company name that's the title on every page.
I've seen these sites have 1 or 2 pages in the main index and then all the other pages in the supplemental. A quick change to the titles and the meta data... and usually the pages make their way back into the main index.
Now, often times when we talk about the supplemental index... Google's company line is that it's not a "penalty" to be in there... but really... it is.... let's face it... most people won't ever get to the supplemental index...
As far as "sorting out pages" that are actual dupes of one another... I can agree that Google usually chooses one to rank... but remember... if you've got 1,000 links going to Dupe A and 300 to Dupe B.... Google will sort it out... but more likely those 300 links won't get added automagically to Dupe A without a 301.
Edited by phaithful, 12 December 2006 - 03:57 AM.
#5
Posted 12 December 2006 - 04:00 AM
This is exactly the sort of stuff I have been harping on about for the last week with SoftPlus.
Another way of seeing it would be to say "Once we see your duplicate content, we simply rank one version of it, not ten"'
There is no penalty, but there is no gain for doing it.
Internal duplication is more sinister for me. External duplication is difficult to administer.
#6
Posted 12 December 2006 - 10:49 AM
I think it pays to compose and setup your own specific page for a certain item and get people onto a conversion path of your choicing.
Good score for SEOmoz to have an interview like this. It's always a pleasure to read the site. Informed opinion, well researched facts. Refreshing food for thought in the world of mere conjecture
#7
Posted 12 December 2006 - 11:44 AM
I still think there are BIG problems related to having internal duplicate content. On the sites where we see them, indexing takes longer (probably since Google needs to sort out which version to index), ranking is lower (probably because different links point to different versions, splitting up link juice) and internal search doesn't work as well, either.
Despite what Matt & Vanessa had to say, I'm a big skeptic about just letting the engines sort it out. To me that's not SEO, it's SEL (search engine laziness)...
#8
Posted 12 December 2006 - 12:00 PM
I'm certain that if you have enough "value" for your site to merit crawling the 100x duplicated content and still get it updated regularly, it will have a minimal influence and it certainly won't be a penalty. However, if you don't have that much value (and let's face it, most sites don't) then Google will end up crawling a part of the 100x duplicated content and perhaps miss out on the great stuff you really wanted indexed.
Let's face it: Google and most other search engines are very, very good at crawling technically deficient sites. However, they really excel when the site is technically correct. And ... after all, if you know that your site is not up to par (technically) what real reason is there to keep it that way? If you can fix it, if you can improve the usability for users or for bots, why not do it? Why should you not give the users and the bots a single URL for your content and work on keeping that URL valid for the lifetime of your content?
The same can be said for most other technical issues that webmasters face today. Even if a search engine can crawl a jumbled spaghetti puzzle of html code, wouldn't it still make sense to make their job easier and get the code to at least validate on a block level? After all, we want them to destill the essence of our content and direct people to it: why not make their job as easy as possible. To me, it's just common sense
John
#9
Posted 12 December 2006 - 12:00 PM
Despite what Matt & Vanessa had to say, I'm a big skeptic about just letting the engines sort it out. To me that's not SEO, it's SEL (search engine laziness)...
Exactly what I tried to say in the second post - but didn't put it as well!! Very good interview Rand... I thought you had good chemistry between you! Anything you want to tell us ;-)
#10
Posted 12 December 2006 - 03:49 PM
By the same token, when a domain is in the dumps, everything matters. Low Toolbar PR, lack of inbounds, lousy content, meta keywords, HTML to content percentage, keyword density, domain age, DMOZ listing...
Still, if SEO is your profession, I think not being able to write valid code or keep a tight .htaccess is downright unprofessional. Some may say validation is a waste of time, but if you know how to write clean code, then time isn't wasted. If you know how to spell, who needs a spell checker?
Edited by Halfdeck, 12 December 2006 - 03:50 PM.
#11
Posted 12 December 2006 - 05:28 PM
Paul
p.s. if you want to, come and say a few words about yourself in the Introduce Yourself forum, and maybe have chat in After Hours ?
Edited by send2paul, 12 December 2006 - 05:28 PM.
#12
Posted 12 December 2006 - 09:38 PM
Canonical issues: www or no www, some SEO's say it's duplicate content.
index.html/php/htm/asp etc.. whatever it ends in, it's the default page assigned by apache or ms - some SEO's say this is also duplication (huh?)
If Google or any search engine decided that these were in fact duplicate content, then every webmaster across the world should learn how to mess with apache and win documentation (ouch) that'll be ugly. Many web designers today simply hire on a private server, design websites and throw them up. They don't really understand the means behind it.
There are so many variations to what is duplicate content, but based on my observations over the past year, here are my findings:
Multiple sites with similar content will not be entirely banned from Google.
Only a few of those domains may rank or remain on the index.
Some domains will be banned, there's probably more than just duplication (Don't worry, Matt Cutt's will send your webmaster an email with reasons why.) see http://www.mattcutts...s-hacked-sites/
Great site and forums by the way
#13
Posted 13 December 2006 - 12:24 AM
Thank you Asia - and welcome to Cre8asite.Great site and forums by the way
I'll make it a regular stop
What I like about debates like this are the useful snippets of info which pop up. For example, from phaithful:
It would be interesting to see if there was anywhere/anybody who has actually tested, (or can show examples of), this - and any of the other points raised in this debate.Many sites out there don't have unique titles and / or descriptions for each of their pages. Many times it's just their company name that's the title on every page.
I've seen these sites have 1 or 2 pages in the main index and then all the other pages in the supplemental. A quick change to the titles and the meta data... and usually the pages make their way back into the main index.
Paul
#15
Posted 13 December 2006 - 05:52 PM
#16
Posted 15 December 2006 - 03:54 PM
#17
Posted 15 December 2006 - 04:14 PM
That could be a problem -- mostly because you never know which pages will get filtered out. If you can make that choice, you can set it up so that the bots will accept it like that as well. I used a shingle comparison tool and checked two types of blogs (full post previews vs snippet previews, few comments vs many comments) to see how they fare: (the full thread)
full posts, few comments: http://oy-oy.eu/page...JFTdj20Q#report
full posts, many comments: http://oy-oy.eu/page...5z1AgHHJ#report
snippets: http://oy-oy.eu/page...4CJqImxy#report
(can you recognize the blog posting that had the full post as a snippet?)
John
#18
Posted 15 December 2006 - 08:01 PM
BTW the word "penalty" may be a misnomer here.
#19
Posted 16 December 2006 - 04:36 AM
Vanessa it's good to see you here but I have to wonder how great a job Google can do picking the right "version" when it still can't solve the canonical name problem on it's own.
Yeah, when your site have links pointing to those pages that gets filtered out, then I see it as a problem even if Google discounts those links as equivalent to links pointing to 404s. Say if you have 10 links on a page, 1 outbound to wikipedia. Now Google decides one of those links point to a duplicate page and discounts it. So from Google's POV, you really only have 9 links. That means the amount of juice flowing out of that one outbound link is slightly greater than it should be. If your site is popular (its got thousands of IBLS), it may be nothing to worry about, but if its a new site with only a few inbounds or if its a 10,000+ page site with lots of duplicate issues, it could become a problem.
Keep in mind, Google's algo is a piece of code, and as with any piece of code, there's always going to be bugs and ways to break it when you do something unexpected by the coders.
#20
Posted 16 December 2006 - 08:30 AM
I am working with someone right now that is in the process of removing duplicate titles as in...
"company name - company logo - general catergoy - actual page title here"
and replacing them with...
"actual page title"
If they are cool with it I will post some numbers, but I have definitely seen, over and over again, unique titles and descriptions revive a website.
Reply to this topic

0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users






