Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Google Indexing Limit


  • Please log in to reply
23 replies to this topic

#1 Jonny_X

Jonny_X

    Unlurked Energy

  • Members
  • 7 posts

Posted 18 April 2006 - 12:23 PM

Hi all,

I have a website located at http://www.1stchoicecufflinks.com and have had a steady stream of traffic from google for a while.

We have a PR 5 and used to have about 800 pages indexed, i recently checked and we only seem to have about 100 pages indexed.

We have added about 2000 pages recently and changed the menu.

I have also looked at google sitemaps and intend to impliment this however would really like to understand why googlebot has dropped the pages, the only therory i can think of is the amount of links on a page as the menu alone is about 100 links, therefor on a category page about 130 links or more (the menu html is placed at the end of the html as appose to the begining).

If anyone can offer any insight, advice or explantion for this issue, it would be much appriciated.

Thank you

All the Best

John Wright

#2 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 18 April 2006 - 01:03 PM

Hi John,

Over on the Stanford web site, there's a page that lists some of the papers which influenced the creation and functionality of Google:

Working Papers Concerning the Creation of Google

Among the papers listed is one that is one of the first to set out a set of standards for the crawling of web sites, and the decisions made as to which URL to follow next:

Efficient Crawling Through URL Ordering

Here's the abstract for the paper:

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without.


The importance metrics that it describes are things that Google may be doing when it decides which pages to visit next, and which URLs to send to a document indexer. Those are some things that you might want to look at regarding your site.

An example on one of them - Google would prefer to try to index the home page, and root directory level pages of as many sites as possible instead of indexing less sites deeper. So, pages on sites with deeper directory structures might not get indexed as readily as pages on the root directory. Here's an example:

http://www.example.c...ry/product.html

Importance metrics, like those defined in the paper, can be combined, so on a site that has a number of pages with higher pageranks, or more inbound links, those might help combat the weakness of a page like that when it comes to a importance metric based upon location and distance from the root directory.


There are other issues involved, too. The importance metrics listed above, and any possible changes or improvements upon them that have happened since those were written rely upon a few other things.

One is that a site has text based links that a search engine spider can actually follow to other sites.

Another might be the possibility that a site might have multiple URLs for the same pages, because of things like session IDs, or the passing of multiple variables through http headers. Spider traps may also cause a spider to choose to leave a site before it indexes too many pages.

By "menu" I'm assuming that you mean the sitemap on the site (as opposed to a Google Sitemap - I wish they had called that something else.) I'm not sure that there's really any harm to having 130 links, as opposed to 100 or so, though Google does warn people not to have more than 100 in their webmasters' guidelines pages.

I'm assuming that the site is using a content management system/ecomerce system. Does it enable you to build some more than one index (sitemap) page? If so, you could create and additional one that might just be organized by designers, or by materials, with links on your pages to "browse by designer" or "browse by material." That might be one approach to seeing if that is a problem, and it might be solved in a manner that's friendlier to shoppers, too.

#3 Jonny_X

Jonny_X

    Unlurked Energy

  • Members
  • 7 posts

Posted 18 April 2006 - 02:37 PM

Thank you for your reply!

However, this doesent really answer my question as this website used to have 700-800 pages indexed.

#4 cpdohman

cpdohman

    Ready To Fly Member

  • Members
  • 23 posts

Posted 18 April 2006 - 03:02 PM

hi jonny,

i read a thread at wmw where many are writing about losing many pages in the google index. some are seeing their number of indexed pages get chopped and then come back slowly and attribute it to google cleaning up. you may want to check a sampling of the data centers. let us know if you see your pages coming back at all.

very intereesting info about the crawler priority bill. another item to add to the read list. thanks.


chris

#5 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 18 April 2006 - 04:03 PM

Your question was why Google might have dropped some pages, so I tried to provide some information about how Google may be looking at those pages.

With the addition of a large number of pages, you've made some significant changes that may impact how the site is crawled and indexed. One thing I noticed was that each product page also has an associated "email to a friend" and "product query" page, which means that you've added a considerable amount of pages that a crawler can search through that are very similar to each other.

I see, in the 181 results listed for your site in Google, that the search engine tried to index a few of those, but is listing them as supplemental pages. It's possible that it may have cut back its efforts to index pages from your site because there are so many of those.

It's tempting to suggest disallowing those types of pages via the robots.txt file, so that the search engine knows to focus upon product pages that have indexable content and don't appear as similar to each other.

Eliminating them from what the crawler can look at might result in more product pages being listed, because the crawlers have more important pages to choose from on your site.

#6 FP_Guy

FP_Guy

    Mach 1 Member

  • 250 Posts Club
  • 413 posts

Posted 18 April 2006 - 06:18 PM

I've noticed the same thing about the number of pages indexed and it is happening everywhere. It keeps flipping from the old number to the shortened new one.

If you check again in the near future you may see all of you pages back again for indexing depending when they are running their tests. I wonder if the new number is going to become the standard.

Another thing I noticed is that with relatively new rankings they would disappear and come back just like the indexed pages. The older rankings that were in the top ten for a few months has been staying there.

Michael

#7 Black_Knight

Black_Knight

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 9339 posts

Posted 19 April 2006 - 04:04 AM

However, this doesent really answer my question as this website used to have 700-800 pages indexed.

I think it may answer a lot more than you've managed to catch. Sure, you used to have more pages indexed... but has the web stood still? Are the pages still as 'fresh' now as they used to be?

The importance of crawl ordering is a major thing for any search engine which, like google, needs to get more than 6,000 pages per second, every second of every day, without crashing any dynamic websites or any servers, just to keep its index refreshed once every 30 days (Capturing new pages requires additional page getting).

1. Your pages are not new anymore.
2. The number of other pages around the entire web to crawl is increasing.
3. The data regarding the value of your pages may not support continued listing (think of a 'fresh bonus' that may expire and needs to be replaced with the links and traffic data from having been 'given' a listing).

You should find that the pages will get picked up again sporadically as they drop out of the index if this is indeed the case.

If the pages do not get reindexed when Google 'forgets' it had given them a benefit before, then it would be advisable to look for other potential causes.

In all of this, I assume you have done the obvious in considering that some of the pages may have been classed as 'near duplicates' with insufficient unique content. Google's most recent changes have made it a lot less tolerant of 'similar' pages and many sites that are not generally 'content heavy' (lots of unique text per page) have lost a lot of pages from the serps (but not necessarily from the index).

Hope that helps you to think of some other factors that could be at play, and hopefully to find ways of addressing any potential weaknesses that consideration may highlight.

#8 Jonny_X

Jonny_X

    Unlurked Energy

  • Members
  • 7 posts

Posted 19 April 2006 - 05:07 AM

Thank you for all the advise, but i have a number of similarly designed websites such as,

http://www.washington-lc.co.uk same problem

http://www.petcentreonline.co.uk doesent have the problem but has different menu.

http://www.blackettsdoors.co.uk similar problem 190 indexed

not only that but i use nofollow tags for email a friend and product enquiry not since the start but impimented over 6 months ago after realizing the problem myself.

Finally does anyone think if i add a google sitemap it will solve the problem?

Regards

John Wright

But after studying some of the results for washington-lc.co.uk there are still some results from 2004 which means i think it could have somthing to do with the clean-up. even though i stated to use the no follow, it had already indexed them before.

#9 WeRASkitzzo

WeRASkitzzo

    Ready To Fly Member

  • Members
  • 22 posts

Posted 19 April 2006 - 09:47 PM

We've got a thread going on at the Refuge about something that might be the cause of your problem. Now if you have ALWAYS had problems getting more pages indexed then it probably isnt your cause but if you've been having this problem only recently then maybe... Basically the gist of that thread is that a lot of sites have been seeing a massive drop in the number pages indexed. For your sake, I hope this is just a temporary problem and your pages are spidered and indexed soon.

#10 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 19 April 2006 - 10:58 PM

I am seeing a lot of people, in different forums, writing about losing pages in Google's index.

I'm also seeing a few guesses at possible reasons. I think that this goes beyond the number of links that you might have on one page, and while some of the ideas that I wrote about involving spiders and crawling may point to possible solutions, it does seem to be affecting a fair amount of people.

not only that but i use nofollow tags for email a friend and product enquiry not since the start but impimented over 6 months ago after realizing the problem myself.


That's not quite the same thing as disallowing pages. The "nofollow" property for the rel attribute of the link element was created to tell google not to pass along link popularity to pages, but it wouldn't keep the search engine from attempting to index those pages and keep them from crawling them. It wasn't intended to be a substitute for a disallow in robots text or for meta noindex tags. I'd be hesitant in using it to point to my own pages.

Finally does anyone think if i add a google sitemap it will solve the problem?


It might not help, but if you do it right, it probably wouldn't hurt.

#11 Chris Boggs

Chris Boggs

    Unlurked Energy

  • Members
  • 5 posts

Posted 25 April 2006 - 01:53 PM

Finally does anyone think if i add a google sitemap it will solve the problem?



It might not help, but if you do it right, it probably wouldn't hurt.



Well Bill I have been lurking for long enough. I would appreciate if you could elaborate on your response to John’s last question. This topic actually came up this afternoon when I came in to the office. One of our senior website developers, Phil, asked me "hey Chris, do you know anything about Google Sitemaps?" After I told him "not really " (I never say yes to a question like that when posed by a programmer, which I am not), we talked about it. He had just discovered it, and said to me that "it seems like you can actually tell Google which pages you want it to index."

I personally know that this is the communicated goal of the Sitemaps system, but being a non-developer who rarely interacts with the system, I usually focus on research post-submission. The Sitemaps database is to be considered completely separated from the crawl index, according to Google.

This program does not replace our normal methods of crawling the web.

And they go on to say

A Sitemap simply gives Google additional information that we may not otherwise discover.


This brings me to the initial conclusion that was reached by Phil this afternoon: that the Google fresh and deep crawls "probably first check the Sitemaps database for instructions/information."

Makes sense, but is this the case? I know this could veer off into a Sitemaps discussion, but I feel this is topical to the first post. If this is covered in another existing post, kindly link to it for me? :) I would love to be able to find a consensus on this topic...

#12 Aaron Pratt

Aaron Pratt

    Whirl Wind Member

  • Members
  • 88 posts

Posted 25 April 2006 - 11:15 PM

You can blame all those who mindlessly load content into the search engines because of their incorrect belief that this is what gives them value and results. Say if I have a site on "ringtones" and load "content" into it daily along with millions of others trying to own the same ringtone internet real estate. Everyone can't be #1 for ringtones and all the variations of, so the serps become diluted and people start bitching about losing indexed pages.

What I believe is happening is that Google is replacing the old with the new and for a site to remain fresh it takes serious dedication. This is not good news for those of you with clients...the organic search will become SEO proof in the near future and I am not sure it is a bad thing after-all really. If you are an SEO be sure to make your "clients" understand that this will take their dedication as well, there is no magic ball and to be #1 you really have to have something to offer.

Sitemaps is a window in and Google is expanding it's features for webmasters and SEO's alike agree?

#13 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 26 April 2006 - 01:39 AM

Hi Chris,

It's good to get you out of lurking mode.

I'd be happy to elaborate.

The Google Sitemaps aren't a terribly new idea. If you look back to a search engine from 1994 - ALIWEB (Archie-Like-Indexing-in-the-WEB), you'll see a search engine that relied primarily on something like the Google Sitemap.

Using existing Web protocols and a simple index file format, server administrators can have descriptive and up-to-date information about their services incorporated into the ALIWEB database with little effort. As the indices are single files there is little overhead in the collection process and because the files are prepared for the purpose the resulting database is of high quality.



The objectives of both systems appear to be similar, too. Here's what Martijn Koster wrote those were:
  • To reduce the effort required to maintain an index.
  • To reduce the effort required to search the index.
  • To place low requirements on infrastructure.
  • To help towards future systems.
One benefit of using a Google sitemap is that you may get a page in Google's index that wasn't otherwise in there before. But, chances are that the page you get in the index won't rank for much of any value.

Because Google's ranking algorithm is still partially based upon an analysis of anchor text (and possibly some text surrounding that anchor text) pointing towards a page, and pagerank, the fact that Google can't find links to that page means that a page discoverable only through a Google sitemap isn't going to be considered relevant for much.

There may possibly be some links to the page, but a lack of some mix of importance metrics like the ones I mentioned above may mean that Google hasn't bothered to dig deeply enough through a site to index pages. Even if it sees the pages in a Google sitemap, that doesn't mean that it will find them important enough to index.

Given a choice between indexing 5,000,000 sites two directory levels deep or 50,000 sites six directory levels deep, a search engine is going to probably focus upon the larger number of sites, while possibly indexing deeper on sites that it decides are more important either through having pages with higher pageranks, or more inbound or outbound links, or on specific topics, or some other manner, or a mix of those factors.

One potential benefit of the Google sitemap program is that they are providing some information about potential errors that they see. The Google sitemap program didn't start out with this error reporting mechanism, and it was probably a good idea to add it.

In many cases though, the people who know how to resolve those issues are also the ones who know how to recognize them. But, having those errors in front of you from Google, which you might be able to bring to organizational decision makers, might be enough "evidence" to get funding and resources to resolve those problems. So one of the tangible benefits of Google sitemaps are as a discovery tool, that might possibly be useful as a catalyst for change where an organization is hesitant to make changes.

I remember looking at ALIWEB in the mid 90s, and asking myself if I wanted to go through the trouble of creating an index file for them. It really looked like too much effort for too little return.

There are a couple of misleading statements on the Google sitemaps page. Here's one:

A smarter crawl because you can tell us when a page was last modified or how frequently a page changes.



When a spider visits, and checks things like the last modified date, it can get tell the last time a page changed, but not how frequently it changes. When one of these sitemaps get visited, Google can tell the last time a page changed, but not how frequently it changes. It needs to record in both instances the change dates, and track them to even try to gauge frequency. It may miss "last changed dates" if it doesn't come back often enough in both cases. A possible benefit may accrue to Google because they only have one place to check on a site - but then they still need to check to make sure that the sitemap is accurately reflecting changes. So, possibly a little less work for Google, but the sitemap doesn't tell them frequency of changes.

Better crawl coverage and fresher search results to help people find more of your web pages.


Yes, Google may see and index more of your pages because you have a Google sitemap, but as I noted above, the reasons why Google didn't visit your pages without a Google sitemap may also be reasons that can lead to the pages not showing up in response to queries.

Increased coverage in the search engine doesn't necessarily mean increased rankings and increased traffic.

If I had a real gripe about Google sitemaps, it's the name.

A sitemap on a site like the type that Google recommends on their Webmaster guidelines is a better option than a Google sitemap if it uses text-based links and is linked to with text based links. Not only might it help people find pages on a site, but it also may help pages rank better than the Google sitemap:

Offer a site map to your users with links that point to the important parts of your site.


I wish they had called Google sitemaps something different to avoid confusion between the two types of things. I have seen a number of people get the two concepts confused. My vote would be "Google Index file."

Index files - that's what Martijn Koster called his at ALIWEB.

Will these Google sitemaps really help Google index the web? Are people too lazy to create them? Is the process too complex for the average site owner, and unnecessary for the knowledgeable webmaster? Will they be maintained and updated the way they should be?

Will enough people make changes in response to error reports to make even a small bit of difference? Maybe. That's something.

#14 Chris Boggs

Chris Boggs

    Unlurked Energy

  • Members
  • 5 posts

Posted 01 May 2006 - 11:06 AM

Bill...sorry about the delay, but thanks so much for your very informative post! I read it the day you put it up and was impressed with your knowledge as usual. :applause:

I'll spend more time when I can...

Edited by Chris Boggs, 01 May 2006 - 11:07 AM.


#15 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 01 May 2006 - 08:21 PM

Thanks, Chris.

#16 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 4003 posts

Posted 16 May 2006 - 11:29 PM

Matt Cutts has given an explanation and a remedy to sites, having problems with getting indexed.

#17 bragadocchio

bragadocchio

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 15634 posts

Posted 16 May 2006 - 11:54 PM

After looking at the example sites, I could tell the issue in a few minutes. The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site. Examples that might cause that include excessive reciprocal links, linking to spammy neighborhoods on the web, or link buying/selling.



I wonder how much impact that might have had on bloggers who link to each other not because they want to boost their pagerank, but rather because they are building relationships with other bloggers, and decentralized networks. How easy is it to tell "reciprocal links" as part of a linking scheme versus "reciprocated links" because people are ignoring the search engine and building sites for their visitors (I've heard that before somewhere.)

Then again, I'd venture to say that most of those bloggers wouldn't be affected too much if Google's reliance on links were to vanish completely. There's still Digg, Technorati, Metafilter, Rojo, Del.icio.us, and others.

#18 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 4003 posts

Posted 17 May 2006 - 12:31 AM

I have heard before that reciprocal links based on business relationship, not link exchanges, remained in Google. Basically it means that links from sites on the same topic are relevant and are not devalued, so the blogs shouldn't be hurt much.

Some blogs may act as hubs and gain authority, so that should perhaps be taken into account too.

Also, reciprocal links between blogs isn't everything. There can be one-way links. However, with trackbacks this may be less massive, as compared to reciprocal links.

#19 peaceful

peaceful

    Ready To Fly Member

  • Members
  • 16 posts

Posted 17 May 2006 - 09:49 PM

Matt Cutts has given an explanation and a remedy to sites, having problems with getting indexed.

Some things are better not read! (he made valid points and I can see some of it) but:

It left me scared to get links.

I can understand that my party site shouldn't have a link to say "tahiti film makers" or "hospital sites"
or rather inbound links from them. But I'm just totally at a loss as to what links I can use.
The only two links I have in google are two from the same directory site and yes they are reciprocal. Its not a party directory.
Should I go to that directory and since its a good neighbor put my site under a bunch of different categories? :rofl: Ha, that would be a great test, but not one I'm willing to try yet!
Actually I figure I better bust my hind end trying to find relevant links so when they see that one isn't relevant that I have something to fall back on!

When I build my site I wasn't figuring that links are the thing that got a site indexed, heck didn't even know a site needed links!

#20 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 4003 posts

Posted 17 May 2006 - 10:52 PM

There is nothing so scary in what Matt says.
Perhaps fear comes from a misunderstanding or a lack of knowledge on something?

Basically, what he means is that 'Link naturally'.
Anything that is not natural linking, will be detected, sooner or later, and will be less valued.
Themed reciprocal links are not devalued, as they are relevant.

In other words, link to the sites that you want your visitors to see and don't bother if some bad neighbourhood links to you (this can't hurt you).

#21 Aaron Pratt

Aaron Pratt

    Whirl Wind Member

  • Members
  • 88 posts

Posted 20 May 2006 - 10:00 PM

Cufflinks - What's up with all the "pet" backlinks? Ever heard of something called site flavor? Are those 301'ed from another project you were doing?

And if you want to let me in on some of those links I got a friend who is trying very hard to get her pet baskets found, those links would be perfect for a pet site but for a cufflink site?

Nope, sites need the correct PR push to index deep and hold, it's a new game.

#22 yannis

yannis

    Sonic Boom Member

  • 1000 Post Club
  • 1634 posts

Posted 22 May 2006 - 12:30 PM

I have posted a similar post on another forum. I had a new site submitted sometime in April. I got listed in the index very quickly. Within two weeks it had all its pages indexed about 200. Site got about 10 links from the most popular websites for its theme. The Site went to about 3 in the SERPS for its keywords. End of April site got listed in DMOZ. Unbelievable but true. Trouble started then, I started getting links fast from DMOZ mirror directories. Some of them actually sound like spam... with Sex..Viagra etc.

My pages started dropping from the index. I am currently down to one page! I read Matt Cutts blog actually a number of times... The guy is incomprehensible at times, but in between I gathered that Google might be viewing me as belonging to a bad neighbourhood now, or the rate of gaining backlinks may be too high... therefore I am being penalized.

Any views and suggestions will be appreciated. This is a great site and this is my first post!!

#23 send2paul

send2paul

    Honored One Who Served Moderator Alumni

  • Hall Of Fame
  • 2905 posts

Posted 22 May 2006 - 02:32 PM

Hi yannis - and Welcome to the Cre8asite forums and community :(

Like many people, I am also having a bit of a hard time with some of my own websites in Google. I have been following this thread with great interest. I don't have anything to offer you, (or this thread), by way of explanation as to what is going on, other than to say - "Yes - I know what you mean".

[off topic: As a new member of the Cre8asite community, perhaps you'd like to jump to to Introduce Yourself - and sya "hello"? :) ]

#24 A.N.Onym

A.N.Onym

    Honored One Who Served Moderator Alumni

  • Invited Users For Labs
  • 4003 posts

Posted 22 May 2006 - 08:08 PM

Yannis, there was a link to the Matt Cutt's blog somewhere above in the thread. It gives an explanation to the issue. Also, if you reread the thread from the start, you may understand the whole issue too (hopefully).

Feel free if something is not clear after you visit MC blog and reread the thread, though.
No question is too dumb here.

Edited by A.N.Onym, 22 May 2006 - 08:18 PM.




RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users