![]() ![]() |
UntestedGroup: Members
Joined: 14-September 05
Posts: 7
From: Newcastle Upon Tyne
|
Apr 18 2006, 12:23 PM |
|
|
Hi all,
I have a website located at http://www.1stchoicecufflinks.com and have had a steady stream of traffic from google for a while. We have a PR 5 and used to have about 800 pages indexed, i recently checked and we only seem to have about 100 pages indexed. We have added about 2000 pages recently and changed the menu. I have also looked at google sitemaps and intend to impliment this however would really like to understand why googlebot has dropped the pages, the only therory i can think of is the amount of links on a page as the menu alone is about 100 links, therefor on a category page about 130 links or more (the menu html is placed at the end of the html as appose to the begining). If anyone can offer any insight, advice or explantion for this issue, it would be much appriciated. Thank you All the Best John Wright |
||
| Offline | ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Apr 18 2006, 01:03 PM |
|
|
Hi John,
Over on the Stanford web site, there's a page that lists some of the papers which influenced the creation and functionality of Google: Working Papers Concerning the Creation of Google Among the papers listed is one that is one of the first to set out a set of standards for the crawling of web sites, and the decisions made as to which URL to follow next: Efficient Crawling Through URL Ordering Here's the abstract for the paper: QUOTE In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without. The importance metrics that it describes are things that Google may be doing when it decides which pages to visit next, and which URLs to send to a document indexer. Those are some things that you might want to look at regarding your site. An example on one of them - Google would prefer to try to index the home page, and root directory level pages of as many sites as possible instead of indexing less sites deeper. So, pages on sites with deeper directory structures might not get indexed as readily as pages on the root directory. Here's an example: http://www.example.com/directory/subdirect...ry/product.html Importance metrics, like those defined in the paper, can be combined, so on a site that has a number of pages with higher pageranks, or more inbound links, those might help combat the weakness of a page like that when it comes to a importance metric based upon location and distance from the root directory. There are other issues involved, too. The importance metrics listed above, and any possible changes or improvements upon them that have happened since those were written rely upon a few other things. One is that a site has text based links that a search engine spider can actually follow to other sites. Another might be the possibility that a site might have multiple URLs for the same pages, because of things like session IDs, or the passing of multiple variables through http headers. Spider traps may also cause a spider to choose to leave a site before it indexes too many pages. By "menu" I'm assuming that you mean the sitemap on the site (as opposed to a Google Sitemap - I wish they had called that something else.) I'm not sure that there's really any harm to having 130 links, as opposed to 100 or so, though Google does warn people not to have more than 100 in their webmasters' guidelines pages. I'm assuming that the site is using a content management system/ecomerce system. Does it enable you to build some more than one index (sitemap) page? If so, you could create and additional one that might just be organized by designers, or by materials, with links on your pages to "browse by designer" or "browse by material." That might be one approach to seeing if that is a problem, and it might be solved in a manner that's friendlier to shoppers, too. |
||
| Offline | ![]() |
UntestedGroup: Members
Joined: 14-September 05
Posts: 7
From: Newcastle Upon Tyne
|
Apr 19 2006, 05:07 AM |
|
|
Thank you for all the advise, but i have a number of similarly designed websites such as,
http://www.washington-lc.co.uk same problem http://www.petcentreonline.co.uk doesent have the problem but has different menu. http://www.blackettsdoors.co.uk similar problem 190 indexed not only that but i use nofollow tags for email a friend and product enquiry not since the start but impimented over 6 months ago after realizing the problem myself. Finally does anyone think if i add a google sitemap it will solve the problem? Regards John Wright But after studying some of the results for washington-lc.co.uk there are still some results from 2004 which means i think it could have somthing to do with the clean-up. even though i stated to use the no follow, it had already indexed them before. |
||
| Offline | ![]() |
MemberGroup: Members
Joined: 21-March 05
Posts: 22
|
Apr 19 2006, 09:47 PM |
|
|
We've got a thread going on at the Refuge about something that might be the cause of your problem. Now if you have ALWAYS had problems getting more pages indexed then it probably isnt your cause but if you've been having this problem only recently then maybe... Basically the gist of that thread is that a lot of sites have been seeing a massive drop in the number pages indexed. For your sake, I hope this is just a temporary problem and your pages are spidered and indexed soon.
|
||
| Offline | ![]() |
Untested![]() Group: Members
Joined: 17-February 06
Posts: 5
From: Near Philadelphia, PA
|
Apr 25 2006, 01:53 PM |
|
|
QUOTE QUOTE Finally does anyone think if i add a google sitemap it will solve the problem? It might not help, but if you do it right, it probably wouldn't hurt. Well Bill I have been lurking for long enough. I would appreciate if you could elaborate on your response to John’s last question. This topic actually came up this afternoon when I came in to the office. One of our senior website developers, Phil, asked me "hey Chris, do you know anything about Google Sitemaps?" After I told him "not really " (I never say yes to a question like that when posed by a programmer, which I am not), we talked about it. He had just discovered it, and said to me that "it seems like you can actually tell Google which pages you want it to index." I personally know that this is the communicated goal of the Sitemaps system, but being a non-developer who rarely interacts with the system, I usually focus on research post-submission. The Sitemaps database is to be considered completely separated from the crawl index, according to Google. QUOTE This program does not replace our normal methods of crawling the web. And they go on to say QUOTE A Sitemap simply gives Google additional information that we may not otherwise discover. This brings me to the initial conclusion that was reached by Phil this afternoon: that the Google fresh and deep crawls "probably first check the Sitemaps database for instructions/information." Makes sense, but is this the case? I know this could veer off into a Sitemaps discussion, but I feel this is topical to the first post. If this is covered in another existing post, kindly link to it for me? |
||
| Offline | ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
Apr 26 2006, 01:39 AM |
|
|
Hi Chris,
It's good to get you out of lurking mode. I'd be happy to elaborate. The Google Sitemaps aren't a terribly new idea. If you look back to a search engine from 1994 - ALIWEB (Archie-Like-Indexing-in-the-WEB), you'll see a search engine that relied primarily on something like the Google Sitemap. QUOTE Using existing Web protocols and a simple index file format, server administrators can have descriptive and up-to-date information about their services incorporated into the ALIWEB database with little effort. As the indices are single files there is little overhead in the collection process and because the files are prepared for the purpose the resulting database is of high quality. The objectives of both systems appear to be similar, too. Here's what Martijn Koster wrote those were:
Because Google's ranking algorithm is still partially based upon an analysis of anchor text (and possibly some text surrounding that anchor text) pointing towards a page, and pagerank, the fact that Google can't find links to that page means that a page discoverable only through a Google sitemap isn't going to be considered relevant for much. There may possibly be some links to the page, but a lack of some mix of importance metrics like the ones I mentioned above may mean that Google hasn't bothered to dig deeply enough through a site to index pages. Even if it sees the pages in a Google sitemap, that doesn't mean that it will find them important enough to index. Given a choice between indexing 5,000,000 sites two directory levels deep or 50,000 sites six directory levels deep, a search engine is going to probably focus upon the larger number of sites, while possibly indexing deeper on sites that it decides are more important either through having pages with higher pageranks, or more inbound or outbound links, or on specific topics, or some other manner, or a mix of those factors. One potential benefit of the Google sitemap program is that they are providing some information about potential errors that they see. The Google sitemap program didn't start out with this error reporting mechanism, and it was probably a good idea to add it. In many cases though, the people who know how to resolve those issues are also the ones who know how to recognize them. But, having those errors in front of you from Google, which you might be able to bring to organizational decision makers, might be enough "evidence" to get funding and resources to resolve those problems. So one of the tangible benefits of Google sitemaps are as a discovery tool, that might possibly be useful as a catalyst for change where an organization is hesitant to make changes. I remember looking at ALIWEB in the mid 90s, and asking myself if I wanted to go through the trouble of creating an index file for them. It really looked like too much effort for too little return. There are a couple of misleading statements on the Google sitemaps page. Here's one: QUOTE A smarter crawl because you can tell us when a page was last modified or how frequently a page changes. When a spider visits, and checks things like the last modified date, it can get tell the last time a page changed, but not how frequently it changes. When one of these sitemaps get visited, Google can tell the last time a page changed, but not how frequently it changes. It needs to record in both instances the change dates, and track them to even try to gauge frequency. It may miss "last changed dates" if it doesn't come back often enough in both cases. A possible benefit may accrue to Google because they only have one place to check on a site - but then they still need to check to make sure that the sitemap is accurately reflecting changes. So, possibly a little less work for Google, but the sitemap doesn't tell them frequency of changes. QUOTE Better crawl coverage and fresher search results to help people find more of your web pages. Yes, Google may see and index more of your pages because you have a Google sitemap, but as I noted above, the reasons why Google didn't visit your pages without a Google sitemap may also be reasons that can lead to the pages not showing up in response to queries. Increased coverage in the search engine doesn't necessarily mean increased rankings and increased traffic. If I had a real gripe about Google sitemaps, it's the name. A sitemap on a site like the type that Google recommends on their Webmaster guidelines is a better option than a Google sitemap if it uses text-based links and is linked to with text based links. Not only might it help people find pages on a site, but it also may help pages rank better than the Google sitemap: QUOTE Offer a site map to your users with links that point to the important parts of your site. I wish they had called Google sitemaps something different to avoid confusion between the two types of things. I have seen a number of people get the two concepts confused. My vote would be "Google Index file." Index files - that's what Martijn Koster called his at ALIWEB. Will these Google sitemaps really help Google index the web? Are people too lazy to create them? Is the process too complex for the average site owner, and unnecessary for the knowledgeable webmaster? Will they be maintained and updated the way they should be? Will enough people make changes in response to error reports to make even a small bit of difference? Maybe. That's something. |
||
| Offline | ![]() |
Moderator Alumni![]() Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
|
May 1 2006, 08:21 PM |
|
|
Thanks, Chris.
|
||
| Offline | ![]() |
Star Member![]() ![]() Group: 1000 Post Club
Joined: 29-December 05
Posts: 3,291
From: Novosibirsk, Russia
|
May 16 2006, 11:29 PM |
|
|
Matt Cutts has given an explanation and a remedy to sites, having problems with getting indexed.
|
||
| Offline | ![]() |
![]()
|
|
2 Pages 1 2 >
|
|
| Lo-Fi Version | Time is now: 9th February 2010 - 09:17 AM |
| Meet our Moderators: | cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB |