just before we took over a recent client's site, their previous developer made a last ditch effort at getting them some rankings (as promised in the contract) by putting up about 2 dozen auto-generated doorway pages. Nice huh?
They got the pages up literally a week before we moved the site to our servers, and of course we left the doorway pages behind. Unfortunately, they must have submitted one of the pages to yahoo and google, because they got in there and indexed all of them (each page had dozens of links to the others).
So first order of business was removing the pages, that was done as soon as we took the site over.
Now they are going to get a bunch of 404's for a while, if the engines keep coming back trying to index the pages.
I'd like to redirect the URLs to the homepage, but I'm not sure how dangerous that could be. As far as the engines are concerned, it may look like "put up a bunch of spam pages and get them ranked, remove them right away and redirect them all to the homepage"... could be trouble.
In that case, I ensure that the server returns a 404 for missing pages.
I don't see rdirecting users/SEs to the home page as a particularly good idea. First, the home page may not contain what users are looking for. Secondly, the 404 function is a kind of "virtual page" (even though one may really exist):
site.com/whatever.htm
will be shown the same 404 as
site.com/a/b/c/whoever.htm
I'm probably wrong, but rather than serve 404 which would just tell the visitor and the SE that the content was removed/unavailable, couldn't 301 redirects be used here?
I was under the impression that a 301 redirect would tell the SE that any content had been permanently moved to a new location, which in effect should tell them to replace the doorway pages in their index with the new URLs (the target URL of the 301). That way, not only would visitors to the deleted pages get redirected, but the use of 301 should tell SEs that you're not trying to spam as you don't want them to keep indexing and ranking the doorway pages. Presumably you wouldn't have to 301 all doorway pages to the homepage if there are more appropriate content pages for each doorway page.
If that's wrong, DianeV, please explain where I've misunderstood. I happily live and learn.
buddhu, we all may have different opinions, and I don't know that there's a major "wrong" here.
My thought is that the links to the now-removed doorway pages have a fairly short life cycle; Google will eventually drop the links, though Yahoo may not if it continues as it has been doing. Are the pages in the Google cache?
If it were my site, I'd simply develop what pages I needed rather than to continue to try to make the most of links to pages that are no longer there. However, we're all looking at what to do about (possibly well-ranked?) dead doorway page URLs. Instead, if the pages were that valuable, I'd look at simply developing real pages at those URLs and linking the rest of the site to them and vice versa. Fast. LOL
"Instead, if the pages were that valuable, I'd look at simply developing real pages at those URLs and linking the rest of the site to them and vice versa. Fast. LOL"
Yep, that'd work but I suppose it might be a bit of a rush to develop... what did mike say? 2 dozen? ...pages of quality content before these get dropped from Google's index!
Yeh, Yahoo seems to keep some 404ed pages in its index forever, and it doesn't have a great record at handling 301s reliably.
Mike said Yahoo and Google have indexed the doorway pages. It's unlikely that any of those would have PR higher than genuine pages on the site, but if there was PR to be transferred rather than lost I think 301 is supposed to achieve this with Google.
I'd prefer to 301 all these pages to appropriate new pages. We're in the midst of creating 30 new pages of good content for them, but that takes time. In addition to that, we're doing a website redesign, which hasn't begun yet.
So getting good content up to replace the old stuff is going to take a while. I don't like the idea of the engines returning and requesting 24 pages and getting 404's for all of them, for the next 3 weeks or a month..
On the other hand, the alternative is to 301 all of them to the homepage, as there are no counterpart pages available right now. I don't know if 24 new pages suddenly dropping out and 301ing to the homepage will look spammy and set off a red flag somewhere..
the pages have no pagerank of their own, and certainly no incoming links since they existed for all of about 3 days before we took over the site, so there's nothing to be concerned with there.
I may just sit on it and 301 them all to the new pages when the time comes. I'm sure yahoo will still happily be requesting them lol
Okay. Sorry; what does \"swings and roundabouts\" mean?
A swing takes you backwards and forwards between one point and another, over and over. A roundabout takes you around and around between one point and another. Either may entertain you, and neither gets you any further than the other.
It therefore means "two alternatives that do the same thing - not much difference whichever you choose."
Back to the original topic, I'd go with the 404 errors in this case, because the content of those pages has not moved, it has gone. That's a 404, no two ways about it.
It therefore means \"two alternatives that do the same thing - not much difference whichever you choose.\"
Six of one, half dozen of the other?
I agree with Ammon, the 404 is the "right" thing to do. And in this particular instance, there's not even any real incentive or benefit to risk the appearance of impropriety.
That was my thinking, too -- that these particular links are not that valuable that they're worth jumping over hoops for. Let the search engines get the *real* pages, when they're uploaded.
yea I see what you guys are saying -- definitely no 301ing will be done while there are no pages to replace the old ones.
but when it comes to 404's, my instinct is to fix as many as possible. If the engines keep requesting these for a while after we put up the new site, I'll have to do something about it
Search engines should always drop any correctly formed 404 error URL. The only times a search engine retains a 404 page is when the custom 404 page has been poorly done, without serving a correctly formatted 404 error response to the spider.
Search engines employ a lot of top scientists to ensure they serve searches with relevant results, not error pages.
nevertheless, they still screw it up sometimes. one of the sites I worked on with my former company, which served a correct 404 page, was still getting requests from google for pages that we took down 2 years ago. and not just one page, a whole slew of them. even 3 months after realizing what was happening, I was still seeing new 404 errors in the logs for pages that had been gone for all that time
that's probably an extreme case though, it's the only time I've ever seen that happen
so I'll keep an eye on it when we put up the new site -- if the engines drop the pages, great, if they don't, then I'll have to find another solution
Ah, but Googlebot is designed to request URLs that will return a 404 error, if only to ensure that your site can correctly serve a 404, because otherwise it would seem bottomless/infinite to the spider. It's testing you to make sure its not a dynamic doorway site where any URL will return something, just to provide infinite link pages.
Have you got any 404 errors being returned in the SERPs is the question?
Yep. If there's no PageRank to conserve 301 does seem kinda pointless, and as Ammon points out, that content ain't moved... it gone! So 404 would be the 'proper' thing.
I'll just sit in the corner with my well-used dunce hat on.
I'll just sit in the corner with my well-used dunce hat on.
We all get to sit in that corner sometimes, and my dunce-cap is almost completely worn out after any extended session in the Website Programming forum.
[quote]Ah, but Googlebot is designed to request URLs that will return a 404 error, if only to ensure that your site can correctly serve a 404, because otherwise it would seem bottomless/infinite to the spider. It's testing you to make sure its not a dynamic doorway site where any URL will return something, just to provide infinite link pages.
Have you got any 404 errors being returned in the SERPs is the question?[/quote]
hmm that's interesting and confusing. and yea none of those pages showed up in the results.
still.. I hate 404's lol most times I'll fix them every chance I get[/quote]