CPU Usage High Is there a Bot at work?
Alright something is going on, we are getting new relic alerts.
As you can see something strange started on the 5th of October. Check Under server
-> Apps
on New relic and see what is the application that is causing the High CPU Usage. In this case it was apache
at 80.4%
CPU usage.
The first port of call would be the /var/log/apache/access.log
and if you tail -f
the log you will see the frequency of requests. An example is shown below.
66.249.64.135 - - [14/Oct/2016:12:10:45 +0000] "GET /products/new-products-category.html?color=white,ivory,brown,beige,natural&dir=desc&order=position HTTP/1.1" 403 587 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:47 +0000] "GET /products/grass-products-category.html?color=pink,blue,natural&dir=desc&order=position&p=2 HTTP/1.1" 403 585 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
41.72.195.90 - - [14/Oct/2016:12:10:46 +0000] "POST /index.php/autoassign/adminhtml_api/index/key/1be7da78e61ac5c2f9f7ed4e16084a22/?isAjax=true HTTP/1.1" 200 744 "https://www.example.com/index.php/admin/catalog_product/index/store/0/key/7d2daff670207536c026ba902cd4dce6/" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"
66.249.64.135 - - [14/Oct/2016:12:10:48 +0000] "GET /product-decor/mosaic-listellos-category.html?color=gold,red,black HTTP/1.1" 403 595 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:50 +0000] "GET /products/stonewall-dabbing-category.html?color=black,blue,light-brown&dir=desc&order=position HTTP/1.1" 403 592 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:51 +0000] "GET /products/new-products-category.html?color=natural,grey,beige,white,brown&dir=desc&order=name HTTP/1.1" 403 587 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:53 +0000] "GET /product-decor-category.html?color=gold,white,ivory,pink,grey HTTP/1.1" 403 578 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:54 +0000] "GET /products/new-products-category.html?color=grey,ivory,natural,brown,beige&dir=desc&order=name HTTP/1.1" 403 587 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:56 +0000] "GET /products/stonewall-dabbing-category.html?color=white,bronze,terracotta&dir=desc&order=position HTTP/1.1" 403 592 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.135 - - [14/Oct/2016:12:10:57 +0000] "GET /product-decor-category.html?color=gold,red,white,black,pink,grey&dir=asc&order=name HTTP/1.1" 403 578 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
What else can we do to check bots
You can also use:
To check the number of requests in that many seconds.
tail -n 500 /var/log/apache2/access.log | cut -d' ' -f1 | sort | uniq -c | sort -gr
First Steps
The first thing to do is install fail2ban on the server and configure it for apache.
We have found the Issue
There is an ip: 66.249.64.135
Is making lots of requests to pages that request a lot of processing. What is more is it says it is Googlebot/2.1; +http://www.google.com/bot.html
…Yeah right.
Solving it
First Port of Call
Block the ip with .htaccess
:
Order Deny,Allow
Deny from 66.249.64.135
Next Steps the automatic solution
So for the complete solution we need to block IP’s that are making more than 300 GET requests in 300 seconds. Note you should change this based on your criteria.
Add this to jail.local
:
[http-get-dos]
enabled = true
port = http,https
filter = http-get-dos
logpath = /var/log/apache2/access.log
maxretry = 300
findtime = 300
#ban for 5 minutes
bantime = 600
This will check your apache access log and apply the http-get-dos
filter to it.
In the filter.d directory
do the following:
Do vim http-get-dos.conf
:
then add the following in there:
# Fail2Ban configuration file
#
# Author: http://www.go2linux.org
#
[Definition]
# Option: failregex
# Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.
# You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.
#failregex = ^ -.*GET.*/ip\.cgi
failregex = ^ -.*"(GET|POST).*
# Option: ignoreregex
# Notes.: regex to ignore. If this regex matches, the line is ignored.
# Values: TEXT
#
ignoreregex =
Yeah so this should do the trick. I have found that if you specify an action, it won’t actally block that ip.
I will update with results.
The .htaccess
change seems to have done the trick:
When CPU usage was low, that was when the .htaccess
was edited.
Turns out it is a REAL google bot
To check if the bot is a real google bot check this link. Strange that it is spamming us silly.