Hiya everyone. Those of you who know me probably know I've been all but out of SEO for about 8-10 years now and have been primarily working on data conversion and web application development (which has plenty of SEO/SEM related aspects... but it doesn't really cover this topic).
This question doesn't really fall into that realm though - so I'm hoping that some of you who are up on all the spam and other techniques might be able to help me.
Background Info:
I've been working for a company that has been sending me their 404 logs and having me redirect them. On their newer sites I've had the luxury of just creating some custom code in the 404 page to capture patterns and redirect to the appropriate page. On older sites and static sites, I've been having to do 301 redirects in the .htaccess file. The trick here is that the person before me was quite sloppy and the people in charge of the site don't really care how it's done so long as it's done. I'm never one to just arbitrarily do something because I've been told to do it, though - I need to know "why" in order to be certain that I've done the right thing. And just blindly adding 301's to an htaccess file that is already over 200k in size just doesn't make sense to me.
Now, I understand the purpose of redirecting bad URLs to good ones (if a good one is available), but in the past few months I've been seeing more and more URLs like I'm going to describe below and I'd like to know if anyone has any good insight into why it's happening.
The Issue
I am seeing a lot of 404's coming to pages that look like this:
www.mydomain.com/some/valid/page.html/www.some-random-site.com
Now, in the past it was maybe 3-4 per month and it was most likely just some forum, blog, url shortener, or sharing application that somehow botched the URL. I always generally contended that since the URL only had one or two hits on it that it wasn't worth fixing, but I always went ahead and did it anyway.
This month, though, there are literally 50+ of them - all going to different pages. Even more strange, each one has a different "random-site.com" at the end of it. They are definitely not sites related to my client's site and they are all just root domain names.
Now, I know I can just strip the last bit out of anything that ends in .com and .net etc. But I'm curious as to why it would happen. There is no huge traffic increase and nothing else has changed, but with this many, it seems to me like it's beyond the scope of an occassional error. Especially since a lot of the pages are like page 95 of the "Posts By Author" listing - not a page that would be highly probable to link to.
The Question
If this is, in fact, beyond the scope of just a parsing error - what would someone's motivation be to create these links and/or follow them? Is this just a modern incarnation of old-school referral spam? Is there any talk of this going around?
Thanks!
404 Spam? What's The Purpose
Started by Grumpus, Sep 12 2012 07:03 AM
5 replies to this topic
#2
Posted 12 September 2012 - 08:07 AM
My first thought was some form of referrer spam, but it doesn't seem to make a lot of good sense. My second thought was that some spam tool went wrong (or the baby spammer mucked up the implementation of it) and that makes more sense to me. Baby spammers tend to use tools badly, and spam coders may be coding badly. They probably forgot to include http:// in some URL field, and when the tool did whatever it was trying to do, it combined two urls instead of ... well... whatever it would have done otherwise. Perhaps it was going through two groups of URLs - group A and group B, with group A being URLs the tool pulled from the web (like your pages), and group B being URLs that baby spammer entered by hand (without the http://).
I dunno...that's just a long scenario that may be nowhere close to the truth. Just early morning conjecture. Good question though, and if I run across anyone talking about it, I'll be sure to let you know. Good to see you here, btw.
I dunno...that's just a long scenario that may be nowhere close to the truth. Just early morning conjecture. Good question though, and if I run across anyone talking about it, I'll be sure to let you know. Good to see you here, btw.
#3
Posted 12 September 2012 - 08:22 AM
Thanks Donna. Good to see you too.
Your guesses were about what I was coming up with. The main thing here is that I don't want to play into anyone's hands by redirecting these. If it happens to "validate" credibility for a spammer or otherwise could hurt the client's site by having a spam URL acknowledged by redirecting it to another one, I want to make sure I don't do it. My client will never listen to me if I'm just afraid to do it because of guesses, though. lol
Your guesses were about what I was coming up with. The main thing here is that I don't want to play into anyone's hands by redirecting these. If it happens to "validate" credibility for a spammer or otherwise could hurt the client's site by having a spam URL acknowledged by redirecting it to another one, I want to make sure I don't do it. My client will never listen to me if I'm just afraid to do it because of guesses, though. lol
#4
Posted 12 September 2012 - 08:36 AM
I'd like to know the following for a proper analysis:
* connect each misconfigured URL with referer URL, user-agent string, domain name.
* connect each domain name to registered owner, host.
* connect each domain name to site generator, i.e. WordPress, and version.
* check each referer page for type of page, i.e. forum post, content, product, and whether other external links are similarly misconfigured.
It could be any of the possibilities that you and Donna have mentioned. However, without further analysis knowing why, whether inadvertant or deliberate, and how best to handle, i.e. 301, 403, 404, 410..., is problematic.
* connect each misconfigured URL with referer URL, user-agent string, domain name.
* connect each domain name to registered owner, host.
* connect each domain name to site generator, i.e. WordPress, and version.
* check each referer page for type of page, i.e. forum post, content, product, and whether other external links are similarly misconfigured.
It could be any of the possibilities that you and Donna have mentioned. However, without further analysis knowing why, whether inadvertant or deliberate, and how best to handle, i.e. 301, 403, 404, 410..., is problematic.
#5
Posted 12 September 2012 - 08:49 AM
Actually - I JUST got ahold of the referrer information and that gave me all I needed to know. (For some reason, they never give me all the info I need to process stuff).
Here is the situation - so we can close this up as "SOLVED".
Basically - they added a new feed to their jobs posting section. The URLs were coming from their own site by malforming data that was coming through the feed in a slightly different format.
So, this thread actually belongs in the "Programming and Development" section and should be entitled "Don't Trust Automation without Checking And Double Checking" - with a subtitle "Don't blame the SEO guy for something the data entry people did". lol
Sorry for the false alarm here. I should have thought to check this before posting it, but... I'm an idiot sometimes. lol
Here is the situation - so we can close this up as "SOLVED".
Basically - they added a new feed to their jobs posting section. The URLs were coming from their own site by malforming data that was coming through the feed in a slightly different format.
So, this thread actually belongs in the "Programming and Development" section and should be entitled "Don't Trust Automation without Checking And Double Checking" - with a subtitle "Don't blame the SEO guy for something the data entry people did". lol
Sorry for the false alarm here. I should have thought to check this before posting it, but... I'm an idiot sometimes. lol
Reply to this topic

0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users






