404 Spam? What's The Purpose
Posted 12 September 2012 - 07:03 AM
This question doesn't really fall into that realm though - so I'm hoping that some of you who are up on all the spam and other techniques might be able to help me.
I've been working for a company that has been sending me their 404 logs and having me redirect them. On their newer sites I've had the luxury of just creating some custom code in the 404 page to capture patterns and redirect to the appropriate page. On older sites and static sites, I've been having to do 301 redirects in the .htaccess file. The trick here is that the person before me was quite sloppy and the people in charge of the site don't really care how it's done so long as it's done. I'm never one to just arbitrarily do something because I've been told to do it, though - I need to know "why" in order to be certain that I've done the right thing. And just blindly adding 301's to an htaccess file that is already over 200k in size just doesn't make sense to me.
Now, I understand the purpose of redirecting bad URLs to good ones (if a good one is available), but in the past few months I've been seeing more and more URLs like I'm going to describe below and I'd like to know if anyone has any good insight into why it's happening.
I am seeing a lot of 404's coming to pages that look like this:
Now, in the past it was maybe 3-4 per month and it was most likely just some forum, blog, url shortener, or sharing application that somehow botched the URL. I always generally contended that since the URL only had one or two hits on it that it wasn't worth fixing, but I always went ahead and did it anyway.
This month, though, there are literally 50+ of them - all going to different pages. Even more strange, each one has a different "random-site.com" at the end of it. They are definitely not sites related to my client's site and they are all just root domain names.
Now, I know I can just strip the last bit out of anything that ends in .com and .net etc. But I'm curious as to why it would happen. There is no huge traffic increase and nothing else has changed, but with this many, it seems to me like it's beyond the scope of an occassional error. Especially since a lot of the pages are like page 95 of the "Posts By Author" listing - not a page that would be highly probable to link to.
If this is, in fact, beyond the scope of just a parsing error - what would someone's motivation be to create these links and/or follow them? Is this just a modern incarnation of old-school referral spam? Is there any talk of this going around?
Posted 12 September 2012 - 08:07 AM
I dunno...that's just a long scenario that may be nowhere close to the truth. Just early morning conjecture. Good question though, and if I run across anyone talking about it, I'll be sure to let you know. Good to see you here, btw.
Posted 12 September 2012 - 08:22 AM
Your guesses were about what I was coming up with. The main thing here is that I don't want to play into anyone's hands by redirecting these. If it happens to "validate" credibility for a spammer or otherwise could hurt the client's site by having a spam URL acknowledged by redirecting it to another one, I want to make sure I don't do it. My client will never listen to me if I'm just afraid to do it because of guesses, though. lol
Posted 12 September 2012 - 08:36 AM
* connect each misconfigured URL with referer URL, user-agent string, domain name.
* connect each domain name to registered owner, host.
* connect each domain name to site generator, i.e. WordPress, and version.
* check each referer page for type of page, i.e. forum post, content, product, and whether other external links are similarly misconfigured.
It could be any of the possibilities that you and Donna have mentioned. However, without further analysis knowing why, whether inadvertant or deliberate, and how best to handle, i.e. 301, 403, 404, 410..., is problematic.
Posted 12 September 2012 - 08:49 AM
Here is the situation - so we can close this up as "SOLVED".
Basically - they added a new feed to their jobs posting section. The URLs were coming from their own site by malforming data that was coming through the feed in a slightly different format.
So, this thread actually belongs in the "Programming and Development" section and should be entitled "Don't Trust Automation without Checking And Double Checking" - with a subtitle "Don't blame the SEO guy for something the data entry people did". lol
Sorry for the false alarm here. I should have thought to check this before posting it, but... I'm an idiot sometimes. lol
Reply to this topic
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users