Month: October 2016

CPU usage high…is there a bot flooding you with requests?

CPU Usage High Is there a Bot at work?

Alright something is going on, we are getting new relic alerts.

cpu-usage-high-new-relic

As you can see something strange started on the 5th of October. Check Under server -> Apps on New relic and see what is the application that is causing the High CPU Usage. In this case it was apache at 80.4% CPU usage.

The first port of call would be the /var/log/apache/access.log and if you tail -f the log you will see the frequency of requests. An example is shown below.


66.249.64.135 - - [14/Oct/2016:12:10:45 +0000] "GET /products/new-products-category.html?color=white,ivory,brown,beige,natural&dir=desc&order=position HTTP/1.1" 403 587 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:47 +0000] "GET /products/grass-products-category.html?color=pink,blue,natural&dir=desc&order=position&p=2 HTTP/1.1" 403 585 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
41.72.195.90 - - [14/Oct/2016:12:10:46 +0000] "POST /index.php/autoassign/adminhtml_api/index/key/1be7da78e61ac5c2f9f7ed4e16084a22/?isAjax=true HTTP/1.1" 200 744 "https://www.example.com/index.php/admin/catalog_product/index/store/0/key/7d2daff670207536c026ba902cd4dce6/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"
66.249.64.135 - - [14/Oct/2016:12:10:48 +0000] "GET /product-decor/mosaic-listellos-category.html?color=gold,red,black HTTP/1.1" 403 595 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:50 +0000] "GET /products/stonewall-dabbing-category.html?color=black,blue,light-brown&dir=desc&order=position HTTP/1.1" 403 592 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:51 +0000] "GET /products/new-products-category.html?color=natural,grey,beige,white,brown&dir=desc&order=name HTTP/1.1" 403 587 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:53 +0000] "GET /product-decor-category.html?color=gold,white,ivory,pink,grey HTTP/1.1" 403 578 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:54 +0000] "GET /products/new-products-category.html?color=grey,ivory,natural,brown,beige&dir=desc&order=name HTTP/1.1" 403 587 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:56 +0000] "GET /products/stonewall-dabbing-category.html?color=white,bronze,terracotta&dir=desc&order=position HTTP/1.1" 403 592 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:57 +0000] "GET /product-decor-category.html?color=gold,red,white,black,pink,grey&dir=asc&order=name HTTP/1.1" 403 578 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

What else can we do to check bots

You can also use:

To check the number of requests in that many seconds.


tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr

First Steps

The first thing to do is install fail2ban on the server and configure it for apache.

We have found the Issue

There is an ip: 66.249.64.135

Is making lots of requests to pages that request a lot of processing. What is more is it says it is Googlebot/2.1; +http://www.google.com/bot.html…Yeah right.

Solving it

First Port of Call

Block the ip with .htaccess:


Order Deny,Allow
Deny from 66.249.64.135

Next Steps the automatic solution

So for the complete solution we need to block IP’s that are making more than 300 GET requests in 300 seconds. Note you should change this based on your criteria.

Add this to jail.local:


[http-get-dos]
enabled = true
port = http,https
filter = http-get-dos
logpath = /var/log/apache2/access.log
maxretry = 300
findtime = 300
#ban for 5 minutes
bantime = 600

This will check your apache access log and apply the http-get-dos filter to it.

In the filter.d directory do the following:

Do vim http-get-dos.conf:

then add the following in there:


# Fail2Ban configuration file
#
# Author: http://www.go2linux.org
#
[Definition]

# Option: failregex
# Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.
# You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.

#failregex = ^ -.*GET.*/ip\.cgi
failregex = ^ -.*"(GET|POST).*

# Option: ignoreregex
# Notes.: regex to ignore. If this regex matches, the line is ignored.
# Values: TEXT
#
ignoreregex =

Yeah so this should do the trick. I have found that if you specify an action, it won’t actally block that ip.

I will update with results.

The .htaccess change seems to have done the trick:

blocking-an-ip

When CPU usage was low, that was when the .htaccess was edited.

Turns out it is a REAL google bot

To check if the bot is a real google bot check this link. Strange that it is spamming us silly.