Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Photo

Bad Bot Techniques

tools to block unwanted traff

  • Please log in to reply
23 replies to this topic

#1 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 15 June 2012 - 12:02 PM

Hi again

I have made some bad bot postings to the search engine forum, but really I think it all belongs here in the security forum. I would like to maintain this thread in one place and share some of the stuff I run into.

The latest bad actor I found comes from utel.net.ua, a Ukranian source that also seems to serve its neighbor Poland.

Running the linux command on apache access log

tail -f /var/log/httpd/access_log

showed continuing, repeated hits across several domains.

The command

grep utel.net.ua /var/log/httpd/access_log |awk '{print $2}' |sort |uniq > utel.net.ua

produced a list of results

213.186.119.131.utel.net.ua
213.186.119.132.utel.net.ua
213.186.119.133.utel.net.ua
213.186.119.134.utel.net.ua
213.186.119.135.utel.net.ua
213.186.119.136.utel.net.ua
213.186.119.137.utel.net.ua
213.186.119.138.utel.net.ua
213.186.119.139.utel.net.ua
213.186.119.140.utel.net.ua
213.186.119.141.utel.net.ua
213.186.119.142.utel.net.ua
213.186.119.143.utel.net.ua
213.186.119.144.utel.net.ua
213.186.120.196.utel.net.ua
213.186.122.2.utel.net.ua
213.186.122.3.utel.net.ua
213.186.127.10.utel.net.ua
213.186.127.12.utel.net.ua
213.186.127.13.utel.net.ua
213.186.127.14.utel.net.ua
213.186.127.28.utel.net.ua
213.186.127.2.utel.net.ua
213.186.127.3.utel.net.ua
213.186.127.4.utel.net.ua
213.186.127.5.utel.net.ua
213.186.127.6.utel.net.ua
213.186.127.7.utel.net.ua
213.186.127.8.utel.net.ua
213.186.127.9.utel.net.ua

First checked that these IP numbers were accurate using nslookup

nslookup 213.186.119.131.utel.net.ua
Server: 8.8.8.8
Address: 8.8.8.8#53

Non-authoritative answer:
Name: 213.186.119.131.utel.net.ua
Address: 213.186.119.131

Then went to http://ip2cidr.com/, entered the first and last IP numbers, and this produced the list

213.186.119.131/32
213.186.119.132/30
213.186.119.136/29
213.186.119.144/28
213.186.119.160/27
213.186.119.192/26
213.186.120.0/22
213.186.124.0/23
213.186.126.0/24
213.186.127.0/29
213.186.127.8/31

Not 100% useful, OK? Maybe I'm just not smart enough.....

Then on to http://magic-cookie.co.uk/iplist.html, entered the first IP on the original list and played with the secondary number, which identifies how deep to go into netblocks. A few experiments came up with 213.186.119.131/19, and the list of 8192 IP numbers blocked stretches from 213.186.96.0 to 213.186.127.255

Then checked with http://www.maxmind.com and ran both the first and last IP numbers. They both belong to utel, so that makes it pretty certain that everything in-between is theirs also.

Then came the command (as root of course)

/sbin/iptables -p tcp -I INPUT -j DROP -s 213.186.119.131/19 && /etc/init.d/iptables save && /etc/init.d/sshd restart

(some of these paths may vary for different linux flavors, this is centos)

Now I sit watching results for tail -f /var/log/httpd/access_log | grep utel

Nothing. Zero. Zip. Nada.

Of course this method runs the risk of blocking traffic that you might want -- for example, possibly some users of utel wireless smart phones might not be able to access my sites -- but it seems to me that bots from Ukraine are not doing me a lot of good.

Hope this info is useful

Cheers, Mike

Edited by nuts, 15 June 2012 - 12:06 PM.


#2 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 3189 posts

Posted 16 June 2012 - 04:43 PM

Not exactly sure if this will help the next time but I use http://apps.db.ripe.net/search/ to see who or what owns IP's.
Or http://whois.arin.net/ui for North America.
From the displayed info you can see who the parent is.

When playing with netblocks or IP ranges I use my own calculators (of course): http://bonomo.info/c...-calculator.php

#3 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 16 June 2012 - 05:01 PM

Great, good resources!

#4 iamlost

iamlost

    The Wind Master

  • Site Administrators
  • 5306 posts

Posted 16 June 2012 - 10:28 PM

To complete what bobbb started:
There are five Regional Internet Registries (RIRs). Their whois search services:
1. America Registry for Internet Numbers (ARIN): Canada, USA, Antarctica, parts of Caribbean.
http://whois.arin.net/ui/

2. Réseaux IP Européens Network Coordination Centre (RIPE NCC): Europe, Russia, Middle East, Central Asia.
https://apps.db.ripe...arch/query.html

3. Asia Pacific Network Information Centre (APNIC): Asia (except Russia, Central Asia), Australia, New Zealand, Japan, Philippines, Indonesia, etc.
http://wq.apnic.net/apnic-bin/whois.pl

4. Internet Address Registry for Latin America and the Caribbean (LACNIC): Mexico, Central and South America, parts of Caribbean.
http://lacnic.net/cgi-bin/lacnic/whois

5. Internet Numbers Registry for Africa (AfriNIC): Africa, Madagascar.
http://www.afrinic.n...ces/whois-query

Each offers a number of public and member services and applications beyond whois, worth a look.

#5 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 29 June 2012 - 10:57 AM

Another bad actor:

clients.your-server.de

/sbin/iptables -p tcp -I INPUT -j DROP -s 176.9.0.118/18
/sbin/iptables -p tcp -I INPUT -j DROP -s 188.40.39.212/17
/sbin/iptables -p tcp -I INPUT -j DROP -s 213.133.123.53
/sbin/iptables -p tcp -I INPUT -j DROP -s 213.239.193.170
/sbin/iptables -p tcp -I INPUT -j DROP -s 46.4.100.231/17
/sbin/iptables -p tcp -I INPUT -j DROP -s 5.9.22.170/17
/sbin/iptables -p tcp -I INPUT -j DROP -s 78.46.145.100/17
/sbin/iptables -p tcp -I INPUT -j DROP -s 88.198.234.84/17

That's a lotta IP addresses to block, all belonging to Hetzner Online AG

http://www.hetzner.d.../rechenzentrum/

Tell me if I'm shooting myself in the foot.....

BTW it dropped my server cpu load from 12.64 to 1.21...

Cheers
Mike

Edited by nuts, 29 June 2012 - 10:59 AM.


#6 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 3189 posts

Posted 29 June 2012 - 11:05 AM

There are also a lot of bad players on the amazonaws.com IP ranges. I guess it depends on a definition of bad player.

#7 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 29 June 2012 - 11:21 AM

Yep, I try not to take a moral stance, I run spiders also from time to time (although I keep them single-threaded, one query at a time).....but when I see my server bogging down, it's just self-defense

#8 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 12 October 2012 - 06:55 PM

Hello all

Just found another bad boy, choopa.net


/sbin/iptables -p tcp -I INPUT -j DROP -s 173.199.114.115/20

/etc/init.d/iptables save && /etc/init.d/sshd restart

#9 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 3189 posts

Posted 13 October 2012 - 09:46 AM

But how do you know that the whole range is bad?
choopa is just a web hosting company like GoDaddy or HostGator using the range 173.199.64 through 127

You have just blocked 173.199.112.1 to 173.199.127.254 for 4094 IPs
There are obviously no users surfing out of that range just possible bots.

#10 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 13 October 2012 - 10:43 AM

Hey Bob, that's a very good point, one I have puzzled over again and again. As you point out, hosting companies generally are not ISPs providing access for visitors or viewers.

Therefore, are they doing me any good?

Many of them do not have much enforcement.... aws was a joke, their site said fill out a form, the form didn't work! Also, many like aws rotate their IPs in a random, cloud-like way, so if you block an IP today, the bot comes back with another IP tomorrow. I did check that the first and last IP numbers belong to choopa.

For me the bottom line in this risk/reward analysis is that I don't have the time to be screwing around catching unfriendly bots. Scraping content doesn't really bother me that much, but server load is a serious problem.

Cheers
Mike

#11 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 3189 posts

Posted 25 October 2012 - 11:34 AM

Just a FYI

hostnoc.net just made my hit list.

Since I don't have access to iptables I need to do it in htaccess so I have to evaluate loading htaccess vs is-this-guy-enough-of-a-pain-in-the-butt. My evaluation is the reverse of yours. I know I can't win but here is my chance to give a scraper the 403 finger. They have more IPs than below but they did not meet the criteria.

184.22/16
184.82/16

Edited by bobbb, 25 October 2012 - 11:34 AM.


#12 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 25 October 2012 - 10:00 PM

Hey Bob

I took a look, and hostnoc.net shows up repeatedly in my access_log but not rude. Though you have a point, they are not doing me any good, and the content may well be competing with mine...

Cheers
Mike

#13 Walter

Walter

    Light Speed Member

  • 500 Posts Club
  • 639 posts

Posted 27 October 2012 - 09:12 PM

Hello,

I've read some place that you can put in poisoned files to catch scrapers more or less automatically. The idea was that bad bots will ignore robots text while good bots won't. So you create a page, put it off limits to robots, invisible to regular visitors, and then; as I recall. there was a way to automatically ban any ip that visited that page. Not sure if something like that would work in this case or not but I thought I'd mention it.

Walter

#14 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 3189 posts

Posted 28 October 2012 - 12:49 AM

As you say bad bots don't read the robots.txt so they may never notice. We call them bots but really they are just scrapers with a robot type program that follow links from your main page then just drill down. Sometimes you can see them trying to access sitemap.xml. They are dummies, sitemaps don't have to be called that.

I tried the idea of a secret file some time back but no one ever came.

Edited by bobbb, 28 October 2012 - 12:50 AM.


#15 Walter

Walter

    Light Speed Member

  • 500 Posts Club
  • 639 posts

Posted 28 October 2012 - 05:15 AM

Morning,

I found that article, its actually quite old and was a forum post. They used a PERL script. The script bans the ip and sends an email notification to the admin. Its probably out dated and maybe there are better solutions now but if anyone wants to take a look at it you can find it at:

http://www.webmaster...orum13/1823.htm

Seemed like a clever idea to me. Although, there is a thread running here on Cre8t a site that says Google bots don't always follow robots text either so maybe the idea is flawed from the start.

Something I use is Project Honey Pot and it can be found here:

https://www.projecth...t.org/index.php

They collect and share "bad" ips.

I'll also say that I've geo blocked certain parts of the globe. Its drastically cut down on the security exceptions for my site. I know its not an option for everyone but in my case it made sense.

Walter

#16 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 3189 posts

Posted 28 October 2012 - 11:03 AM

Maybe I install a "secret" file again and see who trips the wire. I have never barred Google so I can't know.

Had a look at the honeyproject link. The top user agent is Java. They have been on my hit list for years along with libwww-perl and Python. The only reason to use those are to scrape. No surprise on top harvester country

#17 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 29 November 2012 - 08:32 PM

Just nailed another one, xlhost.com,

/sbin/iptables -p tcp -I INPUT -j DROP -s 209.190.0.0/17

I'm sure this won't get everybody on that hosting system, but it stopped whoever was hammering my server up to 56% cpu consumption

Cheers

Edited by nuts, 29 November 2012 - 08:33 PM.


#18 bobbb

bobbb

    Sonic Boom Member

  • Hall Of Fame
  • 3189 posts

Posted 17 December 2012 - 05:09 PM

I've picked up another one. Oh they are good. Hard to pick-up.

All seems to be coming from someone called 5280enterprises.com/proxy51.com. Never uses the same user agent on requests spread across 7 IP ranges using 5 different suppliers (DataShack, wholesaleinternet.net, Eonix.net, lionlink.net, EGIHosting). The IPs resolve to all kinds of domain names.

It was the agent and referer that gave it away. I just happened to look at them and said "What are all these agents about?" and "How come my opening page is a referer to so many pages."

#19 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 17 December 2012 - 05:48 PM

wow, a quick google search looks like these are some serious bad guys

http://www.forumpost...ad.php?p=101665
http://riskyinternet...policy/POL5037/

Do you care to post the IP ranges?

#20 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 14 July 2013 - 01:12 PM

Hi All, just updating the amazonaws IP list

 

from https://forums.aws.a...jspa?annID=1701
May 24, 2013

US East (Northern Virginia):

72.44.32.0/19 (72.44.32.0 - 72.44.63.255)
67.202.0.0/18 (67.202.0.0 - 67.202.63.255)
75.101.128.0/17 (75.101.128.0 - 75.101.255.255)
174.129.0.0/16 (174.129.0.0 - 174.129.255.255)
204.236.192.0/18 (204.236.192.0 - 204.236.255.255)
184.73.0.0/16 (184.73.0.0 – 184.73.255.255)
184.72.128.0/17 (184.72.128.0 - 184.72.255.255)
184.72.64.0/18 (184.72.64.0 - 184.72.127.255)
50.16.0.0/15 (50.16.0.0 - 50.17.255.255)
50.19.0.0/16 (50.19.0.0 - 50.19.255.255)
107.20.0.0/14 (107.20.0.0 - 107.23.255.255)
23.20.0.0/14 (23.20.0.0 – 23.23.255.255)
54.242.0.0/15 (54.242.0.0 – 54.243.255.255)
54.234.0.0/15 (54.234.0.0 – 54.235.255.255)
54.236.0.0/15 (54.236.0.0 – 54.237.255.255)
54.224.0.0/15 (54.224.0.0 - 54.225.255.255)
54.226.0.0/15 (54.226.0.0 - 54.227.255.255)
54.208.0.0/15 (54.208.0.0 - 54.209.255.255)
54.210.0.0/15 (54.210.0.0 - 54.211.255.255)
54.221.0.0/16 (54.221.0.0 - 54.221.255.255) NEW

US West (Oregon):

50.112.0.0/16 (50.112.0.0 - 50.112.255.255)
54.245.0.0/16 (54.245.0.0 – 54.245.255.255)
54.244.0.0/16 (54.244.0.0 - 54.244.255.255)
54.214.0.0/16 (54.214.0.0 - 54.214.255.255)
54.212.0.0/15 (54.212.0.0 - 54.213.255.255) NEW
54.218.0.0/16 (54.218.0.0 - 54.218.255.255) NEW

US West (Northern California):

204.236.128.0/18 (204.236.128.0 - 204.236.191.255)
184.72.0.0/18 (184.72.0.0 – 184.72.63.255)
50.18.0.0/16 (50.18.0.0 - 50.18.255.255)
184.169.128.0/17 (184.169.128.0 - 184.169.255.255)
54.241.0.0/16 (54.241.0.0 – 54.241.255.255)
54.215.0.0/16 (54.215.0.0 – 54.215.255.255)
54.219.0.0/16 (54.219.0.0 - 54.219.255.255) NEW

EU (Ireland):

79.125.0.0/17 (79.125.0.0 - 79.125.127.255)
46.51.128.0/18 (46.51.128.0 - 46.51.191.255)
46.51.192.0/20 (46.51.192.0 - 46.51.207.255)
46.137.0.0/17 (46.137.0.0 - 46.137.127.255)
46.137.128.0/18 (46.137.128.0 - 46.137.191.255)
176.34.128.0/17 (176.34.128.0 - 176.34.255.255)
176.34.64.0/18 (176.34.64.0 – 176.34.127.255)
54.247.0.0/16 (54.247.0.0 – 54.247.255.255)
54.246.0.0/16 (54.246.0.0 – 54.246.255.255)
54.228.0.0/16 (54.228.0.0 - 54.228.255.255)
54.216.0.0/15 (54.216.0.0 - 54.217.255.255)
54.229.0.0/16 (54.229.0.0 - 54.229.255.255)
54.220.0.0/16 (54.220.0.0 - 54.220.255.255) NEW

Asia Pacific (Singapore)

175.41.128.0/18 (175.41.128.0 - 175.41.191.255)
122.248.192.0/18 (122.248.192.0 - 122.248.255.255)
46.137.192.0/18 (46.137.192.0 - 46.137.255.255)
46.51.216.0/21 (46.51.216.0 - 46.51.223.255)
54.251.0.0/16 (54.251.0.0 – 54.251.255.255)
54.254.0.0/16 (54.254.0.0 – 54.254.255.255)
54.255.0.0/16 (54.255.0.0 – 54.255.255.255)

Asia Pacific (Sydney)

54.252.0.0/16 (54.252.0.0 – 54.252.255.255)
54.253.0.0/16 (54.253.0.0 – 54.253.255.255)

Asia Pacific (Tokyo)

175.41.192.0/18 (175.41.192.0 - 175.41.255.255)
46.51.224.0/19 (46.51.224.0 - 46.51.255.255)
176.32.64.0/19 (176.32.64.0 - 176.32.95.255)
103.4.8.0/21 (103.4.8.0 - 103.4.15.255)
176.34.0.0/18 (176.34.0.0 - 176.34.63.255)
54.248.0.0/15 (54.248.0.0 - 54.249.255.255)
54.250.0.0/16 (54.250.0.0 - 54.250.255.255)
54.238.0.0/16 (54.238.0.0 - 54.238.255.255) NEW

South America (Sao Paulo)

177.71.128.0/17 (177.71.128.0 - 177.71.255.255)
54.232.0.0/16 (54.232.0.0 – 54.232.255.255)
54.233.0.0/18 (54.233.0.0 – 54.233.63.255)

GovCloud

96.127.0.0/18 (96.127.0.0 - 96.127.63.255)
 

Cheers

Mike
 



#21 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 10 August 2013 - 10:08 PM

Hello again

 

I found a list of bot IP ranges

 

http://myip.ms/info/..._Addresses.html

 

Cheers

Mike



#22 cre8pc

cre8pc

    Dream Catcher Forums Founder

  • Admin - Top Level
  • 14597 posts

Posted 13 August 2013 - 01:22 PM

I pinned this for you.



#23 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 10 February 2014 - 11:45 AM

Hello everybody

 

Here is another amazonaws ip range update:

 

https://forums.aws.a...jspa?annID=1701

 

updated 12/13/13

 

Cheers

Mike



#24 nuts

nuts

    Mach 1 Member

  • Members
  • 307 posts

Posted 17 February 2014 - 03:29 PM

Here is a great page listing IP ranges for a number of bots, claims to be updated, includes yandex, baidu, etc.....
 
http://myip.ms/info/..._Addresses.html





RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users