Jump to content

Cre8asiteforums

Web Site Design, Usability, SEO & Marketing Discussion and Support

  • Announcements

    • cre8pc

      20 Years! Cre8asiteforums 1998 - 2018   01/18/2018

      Cre8asiteforums In Its 20th Year In case you didn't know, Internet Marketing Ninjas released many of the online forums they had acquired, such as WebmasterWorld, SEOChat, several DevShed properties and these forums back to their founders. You will notice a new user interface for Cre8asiteforums, the software was upgraded, and it was moved to a new server.  Founder, Kim Krause Berg, who was retained as forums Admin when the forums were sold, is the hotel manager here, with the help of long-time member, "iamlost" as backup. Kim is shouldering the expenses of keeping the place going, so if you have any inclination towards making a donation or putting up a banner, she is most appreciative of your financial support. 

Recommended Posts

Hi again...

 

I have been trying to refine my traffic so I don't get overloaded on 300+ sites. I have blocked all amazonaws ranges that I have found, along with other individual IP numbers that have overloaded me with bot traffic.

 

So the question is, do baidu and/or yandex do me any good in terms of human traffic -- traffic that clicks on advertising links and that I get paid for? At least in the past, clicks from the North America, Western Europe and Australia have been worh a lot more than clicks from Russia or China.

 

I am looking at yandex and this bot is coming on pretty strong. I probably will at least disallow it in robots.txt and see if it obeys.

 

Any comment??

 

Cheers

Mike

Share this post


Link to post
Share on other sites

A lot depends on what traffic your revenue sources are looking for. If they only want those high value click locations you mention then you might want to block all extraneous IPs. If other geolocations are or become of value open things accordingly.

 

Yandex and Baidu are major players and are becoming more international. Personally I do allow both but only their main webcrawlers - just as with Bing and Google. That does not mean that I don't block Russian and Chinese IPs :) All SEs have a whole range of different crawlers and which ones you allow is a business model decision. I also note that most/all SEs have stealth bots and that most/all SEs follow real and hypothetical links, citations and entities... which may ignore robots.txt depending... bots are only reliably recognised by behaviour... and not always then.

 

Note: I frequenctly block even whitelisted SE bots if they hammer too hard or too repetitively. In such instances it is only for an hour at a time. So far without apparent harm.

Share this post


Link to post
Share on other sites

After reading your comments I went out looking for ways to block IP ranges by country and found what I think is a good reference...

http://www.parkansky.com/china.htm

 

I was surprised at how much code this will add to an htaccess file. Does that cause performance problems?

 

I was hoping for a short and sweet code to block about 1/2 of a hemisphere.

Share this post


Link to post
Share on other sites

I use iptables to block ip ranges, such as

 

iptables -p tcp -I INPUT -j DROP -s 103.4.8.0/21

 

followed by

 

/etc/init.d/iptables save && /etc/init.d/sshd restart

 

This blocks (103.4.8.0 - 103.4.15.255) which happens to be amazonaws Asia-Pacific region.

 

This gives me an iptables rules list like this:

 

list rules 4/20/12

iptables -L -n

 

DROP tcp -- 103.4.8.0/21 0.0.0.0/0

 

Iptables can act as s a linux internal firewall. It blocks traffic before it reaches .htaccess and causes less server load. External firewalls are even better, your server never takes the hit at all.

 

The CIDR syntax is explained here : http://en.wikipedia.org/wiki/CIDR_range

 

Current amazonaws IP ranges are listed here :

https://forums.aws.amazon.com/ann.jspa?annID=1408

Posted on: Mar 20, 2012 2:14 PM

 

Iamlost, point taken on the changing nature of the net, keeping abreast of what is valuable and what is not. Realistically, at the moment I *think* most advertisers in English are looking to reach English speaking customers -- although I have not viewed my sites recently from non-English countries -- it's possible they may be displaying ads in other languages....

 

In general it looks to me like Yandex is pretty active on my machines.

 

Cheers & good luck!

 

Mike

Share this post


Link to post
Share on other sites

Thank you for all of this information, Mike.

 

How did you learn all of this stuff? :)

Edited by EGOL

Share this post


Link to post
Share on other sites

Amazon AWS has a LOT of unpublished c-class IP address blocks. I have yet to find a complete listing.

 

Unfortunately, I learned the hard way that I cannot simply block everything on Amazon because some of the social media services where I pipe my RSS feeds are hosted on Amazon AWS. It's a very sticky wicket.

 

And as far as Baidu and Yandex go, I open up to their crawlers every now and then to see how relevant their traffic is to my sites. I get more traffic from Yandex than Baidu and more crawl from Baidu than Yandex. At the present time I am blocking Baidu but I'll unblock it again later this year.

 

Yandex is definitely trying to break into international search and they have been indexing the English Web for several years.

Share this post


Link to post
Share on other sites

How much bandwidth are these guys eating up? I have always taken the approach that more bandwidth generally leads to more revenue, and not bothered to look at what is hitting my sites.

 

Although I do not get much Baidu or Yandex traffic I would not block them as suddenly something could become popular and I'd lose a lot of good traffic. I guess it largely depends on your market though. I am targeting the whole world.

Share this post


Link to post
Share on other sites

Oddly enough, Google Analytics provides you with virtually no bandwidth analysis (at least, none that has leaped out at me). Webalizer, which has many drawbacks to it, will tell you WHO is drawing down the most bandwidth. The bandwidth reports can be ugly and spiders cannot hide from them. Its been a while since I looked at an AW Stats report but I want to say they also assess bandwidth usage.

Share this post


Link to post
Share on other sites

I use the linux command (as root)

 

tail -f /path.../access_log

 

The -f argument gives you a continuing view of every file request as it comes in. If your server is busy it can be a flow of text that is too fast to read.

 

The apache httpd.conf line

 

HostnameLookups On

 

gives the name attached to the IP making the request,

 

and when combined with

 

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" combined

 

where the %v gives the your domain that serves the request

 

causes apache to print your domain and the requesting domain to the access_log file.

 

It's probably not 100% foolproof, but it gives you a pretty good idea who's requesting what, especially if there are an egregiously huge number of requests from the same IP and/or host. It's a good way to stop DOS attacks also.

 

You can also plug the IP number into maxmind.com, which gives you the real owner.

 

Cheers

Mike

Share this post


Link to post
Share on other sites

I have bitten the bullet with baidu and blocked the following:

 

iptables -p tcp -I INPUT -j DROP -s 119.63.192.0/21

iptables -p tcp -I INPUT -j DROP -s 123.125.64.0/18

iptables -p tcp -I INPUT -j DROP -s 180.76.0.0/16

iptables -p tcp -I INPUT -j DROP -s 220.181.0.0/16

 

The final range actually blocks China Telecom Beijing.

 

Ref : http://www.netnuisance.net/ip/se.php

 

This seems to have lowered my cpu load. I don't know how to measure website results, though I can't imagine that English language advertisers have much interest in China traffic.

 

I do not see such a load from Yandex ... so far it seems to be relatively polite and I am leaving it alone.

 

Cheers

Mike

Share this post


Link to post
Share on other sites

Some people block all traffic coming out of China in an effort to impede hackers' activity. I have never tried that approach. It strikes me as pretty radical.

 

A commenter on SEO Theory recommended Cloud Flare and one of their people followed up to clarify one of my concerns (I thought you had to move your hosting to Cloud Flare but they just manage your DNS).

 

If I understand the concept correctly, your routing passes through Cloud Flare and they take responsibility for managing rogue crawlers. I think they are caching your content but I'm not clear on that. Haven't really had time to fully evaluate it.

Share this post


Link to post
Share on other sites

I took a quick look at cloudflare.com and they describe some things: security (attacks), optimizer and cdn (content distribution network). Apparently the optimizer reduces file sizes by eliminating unnecessary html, seems like they would have to cache for this. Of course cdn is a cache situation. I looked into that with other cdn companies and it can get pretty pricey.

 

Currently working on understanding memcache syntax, to cache pages in ram locally, with a possible migration to couchbase later, which works about the same but also has a text-based file that allows for quicker cache recovery on reboot. This is somewhat outside the topic of blocking bad bots, but both relate to server optimization, i.e. getting the biggest bang for your server buck.

 

I do use postini for mail handling for my primary account. This accepts all my incoming mail and forwards to my own account only that mail which they think is real. It's pretty effective, gets about 90% of the spam with almost no false positives. Postini is activated by the MX records in my DNS, which route incoming mail to them rather than me.

Share this post


Link to post
Share on other sites

Just for curiousity, what do all those people using amazonaws ip addresses do? Are they just scrapers or do they add value to whatever it is they do? Or just to bust our asps?

 

Isn't it obvious when you land on a made-for-Adsense site that it is a junk site? And people still click on their ads? Hmmm. :dazed:

Hey I have a near 100 year old bridge for sale. Crosses the St-Lawrence river into Montreal. Got 4 mini Eiffel towers on top.

 

I hate them too.

Edited by bobbb

Share this post


Link to post
Share on other sites

I tend to concur in the question "What ARE these people doing anyway?" Hackers and crackers I sort of understand, there is evil in the world. The only explanation I have for bad bots and senseless stupidity repeated ad infinitum, is the opportunistic nature of all biology. If it can, it will. Weeds grow, diseases transmit. I guess these people are driven by hope, however misplaced, that their activities will benefit them in the end.

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×