How We Block Proxies, Bots, Scrapers, Trolls & Assholes

As a website owner, you probably have at least a few good reasons to block bots and scrapers.  Scrapers steal your content and unruly bots can do anything from eating your bandwidth to trying to hack into your site.

As a forum or community owner, you may also have reasons to block proxies.  Proxies are what gives many trolls, fakes, assholes, idiots, jerk-offs, and other pitiful people in general, their false bravado.   For some reason, these “tech experts” that have the elite skills to be able type the words “free proxy” into Google, or figure out how to install a TOR client, grow giant balls when they think you can’t track them down to their real IP address.  Give this kind of anonymity to these socially unbalanced people (that’s a nice way of saying losers in real life, or people that forget to take their meds) and they suddenly become “tough guys” with no fear to wreak havoc in your community.  BUT, take away their proxy, force them to log-in from home or work and they suddenly become able to follow the rules or more likely are too chicken to do or say anything and alas, they go away!  If they DO continue to insist on making themselves feel better (it’s sad, I know) by bullying or causing trouble in your online-community, then one report to their ISP (or, the FBI if they are REALLY going overboard) or employer will usually take care of it.  Imagine what mommy and daddy will do when their internet account gets terminated!  If they are adults (yes, sadly “adults” do pull this kind of shit), then they’ll have to deal with the hassle of getting a new ISP or deal with mommy and daddy if they live with their parents in the basement (a common trait of internet trolls).   If reporting them doesn’t help, you can ban their IP and have no worries that they’ll just come right back via a proxy.   Sure, since you can never block 100% of the proxies out there, they may still find a proxy that works, but as your proxy blocking skills grow, eventually it will become too much hassle for all but the most pitiful of trolls or assholes and they’ll give up and go get their kicks bothering some other community.

So here are a few updated tips for blocking bots, scrapers, and proxies (aka trolls and assholes).  Much of this is Drupal focused, but much can be applied to any website/blog/forum.

Start with the obvious:  The Drupal Troll Module.  The Drupal 5.x version of this module had been abandoned several months ago after a critical security flaw was discovered.  But after popular outcry it has been updated and is supported again.  The Troll module allows you to block IP address and re-direct them to a static HTML page, but it also allows you to search your member database by IP address or email address (very handy in some situations).  It supports wildcard searching (just leave the last octet of an IP address blank for example, and it will return all matches) so even tracking down assholes trolls using DHCP is easy.  The Troll module will also easily show you every IP address that a member has ever signed-in with (User|Troll Track) and the domain name.  A member using a legit IP will show a history from the same address or ISP, whereas someone using a proxy will show as coming from many different locations and domains.  After you’ve looked at a few IP histories, the proxies will stand-out like a sore thumb.  You can then block those IP’s using the Troll module or your IPTables firewall.

Next on the list is BadBehavior.  If you use Drupal, you need to install the Drupal BadBehavior module and the BadBehavior script.  If you use WordPress, you need only the script.   BadBehavior can also be modified to work with virtually any PHP based website/forum.  BadBehavior blocks almost all automated bots, scrapers, and spammers – and if used in combination with something like Akismet or Mollom, spam becomes almost a non-issue.  When put in “strict mode” BadBehavior blocks many (but not all) proxies, and is a great first-line of defense, but you can also use information from Bad Behavior with  CSF/IPTables firewall to locate Proxy/Server farms and block them en-masse.

Now for the big guns: The IPTables Firewall.  IPTables allows you to block individual IP address or CIDRs (entire ranges of IPs) from accessing your website/server but instead of simply re-directing blocked address to a static page at the domain-level like TROLL does, IPTables/CSF “drops” all the packets, leaving the troll/asshole/proxy user nothing but an “unable to connect” error.  IPTables is very powerful, and almost by definition that makes it difficult to use.  Because of that, I recommend using CSF Firewall which is almost a GUI for IPTables and also adds some great additional features.    To use IPTables/CSF you need either a VPS or dedicated server with root access.  If you are on a shared host and have asshole problems, you might have to put your big-boy pants on and move to a dedicated or VPS server.

Once you get CSF up and running (it’s really not that tough), do the obvious things like activating the Real Time Block Lists (RBLs) and use the CC_Deny setting to block entire countries that you don’t need hanging around your site (North Korea, China, Turkey, Russia, India come to mind).

After you’ve blocked all the undesirable countries with CC_Deny, you can move on to the CSF.DENY file which allows you to block IP’s and ranges of IP address in CIDR format.   The first thing you can do is import any IP addresses that you’ve already blocked with the TROLL module – then you can start building your proxy-blocking list.

In building your proxy-block list, you aren’t just blocking proxy servers, you really want to block all servers.  There is really no reason for any server other than Google bots, Yahoo, etc, to access your site so blocking any/all ‘server farms’ will protect you not only from assholes using proxies, but also from compromised servers trying to hack your site.  The best source I have found for building my block list (now blocking hundreds of thousands of IP’s and several million domains) is the Bad Behavior module (mentioned above).  By learning how/why Bad Behavior blocks IP’s you can identify servers and server farms and add them by the thousands to your CSF.DENY file.

What to look for in Bad Behavior:  Each time Bad Behavior blocks an IP it logs the IP address and the reason.  The following reasons often (not always, you have to be careful) mean that the originating IP belongs to a proxy or a server:

  • Header ‘Connection’ contains invalid values
  • Required header ‘Accept’ missing
  • Prohibited header ‘Proxy-Connection’ present
  • Header ‘Referer’ is corrupt

Get the IP address from Bad Behavior identified with one of the reasons above and do a quick WHOIS lookup on it.  I like to use http://whois.domaintools.com, but any WHOIS server will do.  Usually (not always) a server or proxy will show other sites listed, an SSL cert, etc.  For example, look at this WHOIS for 67.159.1.17 .  A WHOIS lookup for a regular home ISP connection, or a business won’t show much info at all, for example, look at this WHOIS for this Comcast home user.

So now you have your IP, in our example above, 67.159.1.17, but you dont want to block just that IP, you want to block every server in that entire IP range.  To do that, you add the CIDR to your CSF.DENY file in CSF.   The example server/proxy above has the following CIDR in it’s WHOIS info:

OrgName:    FDCservers.net
OrgID:      FDCSE
Address:    141 w jackson blvd.
Address:    suite #1135
City:       Chicago
StateProv:  IL
PostalCode: 60098
Country:    US
ReferralServer: rwhois://rwhois.fdcservers.net:4321
NetRange:   67.159.0.0 - 67.159.63.255
CIDR:       67.159.0.0/18   <--------------  This is the CIDR
NetName:    FDCSERVERS
NetHandle:  NET-67-159-0-0-1
Parent:     NET-67-0-0-0-0
NetType:    Direct Allocation
NameServer: NS3.FDCSERVERS.NET
NameServer: NS4.FDCSERVERS.NET

If you aren’t positive this is a server-farm you could visit the domain listed, in this case, FDCservers.net.  Their website clearly shows that they are a server hosting company.  You could also google the company name or even the IP to dig up more info.  Now that you are positive that you want to block this entire range or CIDR of 67.159.0.0/18, simply add it to your CSF.DENY.  Sometimes, usually with foreign servers, a CIDR won’t be listed.  In a case like that you can still block an entire range of IP’s by using a CIDR Calculator and entering the beginning IP address and the mask or range/number of IP’s to block.  I usually block an entire 16-bit range, which for the example above would be 67.159.0.0/16  instead of the CIDR above “/18” which applies only to FDCServers, using “/16” blocks everything that starts with 67.159.

When adding your IP’s or CIDR into CSF.DENY be sure to add “# do not delete” after each entry.  Otherwise, once you hit the limit of IP’s specified in your CSF configuration file, older entries will get overwritten with newer entries.

How to block TOR: The Onion Router or TOR is a network of proxies intended to protect the anonymity of internet users.   TOR is great for whistleblowers or government protesters, but not so great for website owners trying to keep assholes out of their community.  TOR is fairly easily blocked by adding the list of “TOR Exit nodes” into CSF.DENY or TROLL.  You can get an updated list of TOR exit nodes here: TOR Exit Node list.  TOR is dynamic and the list changes, so you’ll have to update it every few days or so.

How to block Port Proxies or SOCKS proxies: Port or SOCKS proxies are almost always blocked by Bad Behavior

Sometimes you may end up blocking legitimate users, particularity when blocking entire ranges of IP’s – it’s unavoidable.  When someone complains, confirm their IP address and just remove them from CSF.DENY or your TROLL list – no big deal.  I’ve been using these methods for over a year and I’ve only blocked 10 or so legitimate users (that i know of at least).

If you don’t have/can’t use IPTABLES/CSF, you can also use some of the techniques above to block IP’s and CIDRs in your .HTACCESS file, but I cannot vouch for how well it will perform when the list grows large – and to be effective it needs to be really, really large.

This has turned out to be one of my longest and mostest rambling posts.  If I’ve been unclear or if you have any questions, please post a comment.  And oh – if you’re reading this via a proxy, post a comment and tell me that my techniques don’t work!

54 thoughts on “How We Block Proxies, Bots, Scrapers, Trolls & Assholes

  1. How do you enable Real Time Block Lists using CSF? For that matter, can you use CSF at all if you don’t have Webmin, DirectAdmin or cPanel?

  2. Thanks, Randy. I do have root access. I just dont want to have to install Webmin or cPanel. Aren’t there HTML files that CSF offers direct access to? I’m wondering why they have to be launched via cPanel.

    By “natively” do you mean automatically? I couldn’t find any options for subscribing to RBLs on CSF’s demo screens.

  3. Yes, i believe that CSF can be installed directly (no Webmin or cPanel) – but i think you still need root to install it. as for the block lists, it supports several RBL’s in it’s configuration – you can also specify your own RBL. If you havn’t already, you should check around in the CSF support forum here: http://forum.configserver.com/

  4. Our product ‘Proxy Out’ helps to block out elite proxies. Our product sets a property called ‘verified_client_ip’ if, and only if, it is verified that the client is not using a proxy. There are other cases where proxy usage is ‘uncertain’ but in those cases you can assume a proxy is being used. Please see http://sunsetrainbow.com/ for more info

  5. What is the best block list to use for known open proxies to use as an RBL in CSF? I’ve googled and googled and keep only turning stuff up for mail servers that is in the wrong format (CSF requires it to be in a single line, IP only, .txt files)

  6. 67.159.0.0/18 is a proxy range that should be blocked by all ISPs. FDSservers is allowing many spammers to use their IPs for illegal activities. This is the worst IP range on the Internet…..

  7. How do I block all non-google and yahoo bots? Is there a non-google bot redirect code I could put on my site?

  8. Wow that was a quick response. I think less than one minute.

    Yes, Bing too. Is there a cut and paste code that I could put in my bots document or in my forum header that would redirect or drop all bots except bots at a few specified IPs? …like an (if bot and not IP __ or __ then drop)? I currently have 23 guests on my small community forum that should never have more than 3 or 4 guests at a time – and they’ve been there for two days straight.

  9. You guys can’t block ELITE proxies because they look EXACTLY like everyone else. If you like I’ll open a few accounts here, no prob.

  10. @Andy; strange Andy, because i have seen IPs in the logs that were blocked, that when checked turned out to be ELITE proxies.. yah, open an account at the site (not here, at GrownUpGeek.com), then email the account-name to ‘Hubby at GrownUpGeek.com’ – i’m curious now..

    1. RSFirewall in Joomla can block anonymouse proxies easy, also countries etc.. the issue is that what we need is to BLOCK empty REFERRER as many scanners and hacktools use NO REFERRER who tried to compromised the site using xml useragents attempts.

  11. Hmm i tried to access these sites with my own elite proxy that I use under a vps and I CANT bcuz i get that time out deal Randy mentions with the iptables/csf…. this is BUGGING ME NOW…

  12. Yep, it’s totally working bcuz what I have is the best of the best BUT not every site is willing to do what you do. i.e. you can’t block all the “bad guys” without blocking some of the good guys who happen to be on servers or happen to have some other potentially innocent characteristics. But for a site that is more of a local hangout type of cool site like this, you can get away with tipping the scales a bit :).

  13. Hey Randy, Thanks a lot for the great and helpful article! I’ve been trying to combat some people using various proxy software and services lately and have installed Bad Behavior on Drupal. Problem is, I can’t figure out how to test it. In order to combat proxies is there some additional configuration I need to do?

  14. @Kirk.. there isnt really a good way to “test” Bad Behavior other than to look a the logs and look up the IPs that it blocks. remember though, you will need more than just Bad Behavior to block proxies.

  15. @Daniel – yep, it doesnt work 100% of the time, but thanks for giving me another proxy to block! The real solution is to implement a proxy RBL, but the performance cost is not worth it..

  16. Yep I was using glype with scripts turned off (I’m behind my school now, not the other proxy) What is proxy RBL? and how does one go about setting one up?

  17. an RBL = Realtime Block List.. websites such as sorbs.net keep these lists, updated in real time. You can setup your firewall or MOD_Security to query these lists for EVERY visit – the problem is that it kills your website!

  18. It would be difficult to maintain a list of all Open proxies, HTTP proxies, SOCKS proxies, VPNs, SSH tunnel servers, web-based proxies, Tor and others as they change so frequently. Your .htaccess file would quickly be inflated and reduce performance of your website. BlockScript is currently tracking over 31 million hosting provider IP addresses as well as tens of thousands of other types of proxies, bad bots, and spiders. The software runs locally on your server and updates itself everyday.

    1. I have to agree. IMO, a lot of blog owner have fragile egos and can not stand legitimate criticism of differences of opinion. They actually invite their own torment by being little Maos or hitlers. Different opinions help a blog flourish. I never block or censor anyone. I have never had a problem. I just let people post. If anyone complains I tell them to use their block feature or to just ignore them, as you would do if it were a neighbor whose opinion you do not like. That is why my blogs earn big bucks.

      1. Sounds like blocking criticism is easier than taking your head off your ass.
        Maybe if you just install a few more scripts you’ll never have to deal with a different opinion again.

        1. The problem is that there is a wide gap between voicing an opinion, and just being an asshole. Sadly, all the assholes dont know the difference.
          I encourage sharing opinions and new/different ideas on my blog, but assholes arent allowed.
          It’s funny how it’s the assholes, when being assholes, dont understand why their “opinion” was “censored”, when 100% of the time it had nothing to do with what they said, but HOW they said it. In other words, it was posting something that sounded like it came from an asshole with a 9th grade education that got their “opinion” deleted..
          I probably would have deleted your post, because you dont seem to know how to communicate like a grown up.

  19. “As a forum or community owner, you may also have reasons to
    block proxies. Proxies are what gives many trolls, fakes, assholes,
    idiots, jerk-offs, and other pitiful people in general, their false
    bravado.”

    What about your own false bravado? Get over yourself.

  20. hi this is an interesting article and there are valid reasons for blocking as these unwanted pests use up your bandwidth you are paying for

    ok any idea how to block static.reverse.softlayer.com without the IP, i cant find anywhere within csf to do this

    thanks

    1. dave why would you want to block it without the IP?

      I’m not sure if you can block by hostname in CSF.. I think it’s technically possible but then you would have to enable reverse DNS lookup for EVERY visitor and surely that would impact your performance..

      But.. you can block the entire softlayer IP block by IP. The address range for softlayer.com is 66.228.118.0 – 66.228.118.255 .. when you translate that to a CIDR you get 66.228.118.0/24 .. so, just enter 66.228.118.0/24 in CSF and you should be good.

  21. I have an online business and run my ppc ads on search engines like google and bing. But from a long time I am noticing that i am getting lot of fake clicks and they are coming from proxies. I the people doing the fake clicks are using different proxies. I tried many things to block those clicks but I cannot see any positive results.
    I even used blockscript but it didn’t work as well. I want to know is there anyway by which I can trace the real ip’s or block all the proxies.
    It seems they are using some kind of proxies which are hard to be detected.

    1. Maybe i’m not understanding you correctly, but if you are the advertiser, I dont think you can do anything to block them other than to not allow your ads to run on the websites that these clicks are coming from.
      If you are the publisher displaying the ads you can use the methods described in the post above, but nothing is 100% effective.

  22. I recently found a website called getipintel.net which also does proxy blocking. It’s free and I’ve had good results. I hope this helps someone 🙂

Leave a Reply

Your email address will not be published.