![]() ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Feb 11 2005, 02:55 AM |
|
|
Hi Polarmate,
Good news for you. If it's the blog in your signature, it does appear to still be in Google's index. But, it's the individual comment post pages that seem to be disappearing on you. I wonder if that's part of a duplicate filter, too, since the same content is also included on a larger archive page, and you only have one post per month - so the content is almost exactly the same from monthly archive page to individual post page. I liked your post on Glass instruments, by the way. This was one of the more interesting things that Benjamin Franklin invented. I did check, and didn't see a noindex, nofollow meta tag inserted in my blogspot blog. I don't think that Google would do that, and I didn't see one. But thanks for the warning. I've decided that I should contact Google to see if they could do something about the blog, but I'm still interested in the concept of duplicate content, and how Google treats it. There are some interesting threads and articles on the subject of duplicate content. I'm going to point to some now, and see if I can find some more later (It's getting late). For instance, Shari Thurow's article: Duplicate Content In The Search Engines overs the situation when people try to use more than one domain name for the same site - to capture when people try to type in a company name into a browser address bar. She recommends using a 301 redirect from the secondary domain name to the primary site. Another article, from problogger.com, covers when a site attempts to use RSS feeds to aggregate content on a specific topic from more than one blog, and how that can harm traffic to the original blog - see: RSS Abuse, Duplicate Content and Parasite Websites A Wilson Web article quotes some thoughts from Mike Grehan, on content that is shared by different divisions of the same company, and reproduced on their web sites. See: Reusing Web Content without Getting Penalized Mike provides some interesting observations there on legitimate reasons for companies sharing articles, and offers a couple of good suggestions, but ultimately suggests the idea of using a robots.txt file for duplicated content to avoid potential penalties. This similar page checker was interesting: http://www.webconfs.com/similar-page-checker.php I ran the homepage of my blog against the bloglines RSS feed, and it told me that the bloglines RSS display was 26.132045088567% percentage similar to the blog's homepage. Google's patent on Detecting duplicate and near-duplicate files makes for some good reading on the subject. Here's a snippet that I found interesting: QUOTE In the context of a search engine, the present invention may also be used during a crawling operation to speed up the crawling and to save bandwidth by not crawling near-duplicate Web pages or sites, as determined from documents uncovered in a previous crawl. Further, by reducing the number of Web pages or sites crawled, the present invention can be used to reduce storage requirements of downstream stored data structures. The present invention may also be used after the crawl such that if more than one document are near duplicates, then only one is indexed. The present invention can instead be used later, in response to a query, in which case a user is not annoyed with near-duplicate search results. The present invention may also be used to \"fix\" broken links. That is, if a document (e.g., a Web page) doesn't exist (at a particular location or URL) anymore, a link to a near-duplicate page can be provided. |
||
| Offline | ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 7-November 02
Posts: 6,179
From: New England, USA
|
Feb 11 2005, 09:26 AM |
|
|
QUOTE 1. So duplicate content doesn't kick entries out of the Google database. They're still in there somewhere. Is that true? Nope. Never has. It shows the most relevant and filters out the other one. Relevance varies from term to term so one may show with one search, and another may show for another. Bill - I'm gonna suggest that this isn't so much a duplicate content filter (though I guess that's probably a part of it). Rather, the bloglines site is ranking better because of its authority status. In this thread, I discuss the importance of outbound links, but I also talk about what I've recently begun calling the 1-Click-Removed rule. Basically, the 1CRR states that Google assumes that you are searching for a specific thing but it can't always determine what that specific thing is by your search term. So, it'll prefer to take you to a page that answers every (or the most possible) potential interpretation of the term. Bill, in your case, your site has probably 5-10 articles on the front page. To get to article number 14 from your site, you'd need to make two clicks - first click to the Archive listing and second click to the actual article. By going to the bloglines page, you can get to article 14 with a single click (since 20 articles along with some juicy text appears on the page). Bloglines gets another boost for being an authority site (lots of inbound and outboudn links) and likely it has "hub" power, too. So, even if their listing was identical to your page, it'd likely rank higher. So, because the search term isn't specific enough to bring up a specific article, Google has decided that it's best to take you to the bloglines. (Okay, I just went to your site to use some hard numbers). You have six articles on your front page. The bloglines page has twenty. Thus, the bloglines page is roughly 3.3 times more likely to take the person to the article they were really looking for with but a single additional click than if they were led to your blog. The concept of this has been around for a while - I remember first mentioning seeing it over a year ago, but it's only recently that they really tricked up the weighting of it. Does that make sense? G. |
||
| Offline | ![]() |
Star MemberGroup: Members
Joined: 5-September 02
Posts: 513
From: Boulder County, Colorado
|
Feb 16 2005, 10:38 PM |
|
|
Just catching up with this thread and trying to absorb the ideas presented thus far.
Bill, I was talking about my food blog: http://indianfoodrocks.blogspot.com - all pages were dropped ie an inurl query showed only the URLs, no cache and no snippet. The blog was on the first page for "Indian Food Rocks" and dropped out of sight. About 5 pages are back in the index now. Checking with the Google API shows that the blog still does not rank; however a manual query in Google shows that it's back at #9. I have the 'individual post' page as well as the monthly Archives. Considering that I am not as active as I would like to be, I probably should not link to the monthly archives no rinclude the previous posts' list as I link to my recipes directly too. I should probably also reduce the number of posts that show on the homepage to see if this makes any difference. I am not convinced it was a dupe filter in my case. If it was, at least one of the pages would have shown up. All pages would not have been dropped. It also seems strange that Google would take this action because I used publishing options provided in Blogger. I think what happened to my blog is not what is happening to yours. |
||
| Offline | ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Feb 18 2005, 02:04 AM |
|
|
QUOTE I did check, and didn't see a noindex, nofollow meta tag inserted in my blogspot blog. I don't think that Google would do that, and I didn't see one. But thanks for the warning. They changed comments a few days ago, and instead of using a redirect on all links in comments, they now use the nofollow value to define the rel attribute for anchor elements. Maybe that's what the person who mentioned "nofollow" was talking about? Some great recipes, polarmate. A friend who I used to work with brought Indian food in for lunch everyday, and used to share a little sometimes. Good stuff. I'm going to have to try out that Spicy Jeera Chicken. There is something odd going on with your blog, too. I'm not sure it's completely unrelated. That they are both on blogspot is interesting. If it is a duplicate filter problem, it may be an aspect of the problem that they didn't anticipate, or didn't think was enough of a risk to avoid potential problems with blogspot blogs and the duplication that happens in front page blogposts, archives pages, and individual post comments pages. Since the material is published in three places, and since posting frequency may be fairly low, those pages could share a lot of duplicate content. I don't know if that is what you are experiencing, but I think I'll probably be digging into the subject in a lot more detail. We are also discussing duplicate content here: http://www.cre8asiteforums.com/viewtopic.p...p=114573#114573 It will be interesting to see what Google does with my blog. |
||
| Offline | ![]() |
![]()
|
|
| Lo-Fi Version | Time is now: 9th February 2010 - 05:32 PM |
| Meet our Moderators: | cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |