How to Avoid High CPU Load & Block Bad Bots with Plesk

Highly Effective Approaches

Once traffic gets through to your web server, it causes memory, CPU and disk I/O load. That’s why you need a solution in which no attacker traffic is forwarded to services on the server.

This is what Fail2Ban does in combination with “iptables” on your server. You are probably already using Fail2Ban. If not, it’s high time you did. Fail2Ban can scan the log files of your websites, detect attacks and add the IP address of the attackers to the on-server “ipables” firewall to block further traffic from the attacker IP. To do this, it uses “filters” and “jails”.

A “filter” determines which IP addresses are to be evaluated as malicious. A “jail” determines what should be done with such IP addresses, e.g. how long they should be banned. The jail that is relevant for you is the “plesk-apache-badbot” entry in the file /etc/fail2ban/jail.local with the corresponding filter in the file /etc/fail2ban/filter.d/apache-badbots.conf. It bans well known bad bots by default. It is also mentioned in /etc/fail2ban/jail.d/plesk.conf, but we won’t edit that file to achieve our goal.

Jails are tolerant. For example, they wait for several malicious requests until they actually ban the attacker. Here we will tighten up and block bad bots hard as nails. Once the tolerance limit has been removed, the same jail is also suitable for blocking hacker scans.

How To Block Bad Bots and Hackers Quickly and for the Long Term With Fail2Ban

In the default settings, Fail2Ban is tolerant of misbehavior. For example, a user who enters a password incorrectly several times in a login dialog should not be locked out of the server immediately. In the case of bad bots and hacker scans, however, you should mutate into an extremist. You want to catch these attackers at the first attempt and lock them out. To do this, add to the file /etc/fail2ban/jail.local using a text editor.

The file begins with the [DEFAULT] section. There you will find all the settings that are used by Fail2Ban if no individual settings have been made for certain jails, e.g.

[DEFAULT]
maxretry = 6

This setting means that a user has six failed attempts before a jail blocks them. This is a reasonable tolerance limit for users who mistype their password. But not for hackers or bad bots. You can set the “maxretry” parameter (as well as almost all other Fail2Ban jail parameters) individually for each jail. Anyone who behaves like a bad bot or hacker has no business on your server after the first attempt. Therefore, do not edit “maxretry” for the default setting, but set it to “1” for the “plesk-apache-badbot” jail. “1” is the default on new Plesk installations anyway, but my test installations are upgrades that started out from Plesk 12, then 17, 17.5 where this value may have still been greater. Maybe you are in the same situation and should edit the value?

In the /etc/fail2ban/jail.local file, scroll down to the [plesk-apache-badbots] section and change the “maxretry” entry there. If there is none yet, simply add it:

[plesk-apache-badbots]
maxretry = 1

You should also be very strict with the blocking time (“bantime”). Popular bantime entries for [DEFAULT] are 300 or 600 seconds, i.e. 5 minutes or 10 minutes. This makes sense for users who accidentally mistype their password. However, you can lock out bad bots and hackers immediately and for a very long time without hesitation, e.g. a whole month. This is what it looks like now:

[plesk-apache-badbots]
maxretry = 1
bantime = 30d

You may say that 30 days is too long, because ip addresses change dynamically and could be reused by legitimate users who can then not access your website. However, by my experience, attacks are not driven from dial-up lines, but from infected websites. Such websites normally won’t contact your server at all, so I deem it safe to ban requests from malicious websites for a long time. But after all, it is up to you how long you want to ban.

You do not need to set the otherwise frequently used “findtime” parameter, because the first attempt at unwanted access is already recognized and blocked by “maxretry = 1”. It is therefore irrelevant how many further access attempts are made within a certain time.

Save the changes. In order for the changes in /etc/fail2ban/jail.local to take effect, you must reload Fail2Ban once:

# systemctl reload fail2ban

How To Combat Bad Bots

The apache-badbots filter is installed from the original Fail2Ban resources. But it has not been updated by the Fail2Ban maintainers since 2013. Since that time a lot has changed. Chances are that your apache-badbots filter is dated, so it’s time to update it and make it more effective.

We suggest to update the list of bad bots with the following. It is built based on user agents collected from User Agents and converted into a Fail2Ban filter file by Yaroslav Halchenko’s Github resource, enriched by some extras taken from everyday web hosting business experience. Replace the content of the /etc/fail2ban/filter.d/apache-badbots.conf file with

[Definition]
badbotscustom = thesis-research-bot
badbots = GPTBot|AmazonBot|Bytespider|Bytedance|fidget-spinner-bot|EmailCollector|WebEMailExtrac|TrackBack/1.02|sogou music spider|seocompany|LieBaoFast|SEOkicks|Uptimebot|Cliqzbot|ssearch_bot|domaincrawler|AhrefsBot|spot|DigExt|Sogou|MegaIndex.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush|MJ12|Ezooms|CCBot|TalkTalk|Ahrefs|BLEXBot|Atomic_Email_Hunter/4.0|atSpider/1.0|autoemailspider|bwh3_user_agent|China Local Browse 2.6|ContactBot/0.2|ContentSmartz|DataCha0s/2.0|DBrowse 1.4b|DBrowse 1.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1.00|ESurf15a 15|ExtractorPro|Franklin Locator 1.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1.0.x|ISC Systems iRc Search 2.1|IUPUI Research Bot v 1.9a|LARBIN-EXPERIMENTAL (efp@gmx.net)|LetsCrawl.com/1.0 +http://letscrawl.com/|Lincoln State Web Browser|LMQueueBot/0.2|LWP::Simple/5.803|Mac Finder 1.0.xx|MFC Foundation Class Library 4.0|Microsoft URL Control - 6.00.8xxx|Missauga Locate 1.0.0|Missigua Locator 1.9|Missouri College Browse|Mizzu Labs 2.2|Mo College 1.9|MVAClient|Mozilla/2.0 (compatible; NEWT ActiveX; Win32)|Mozilla/3.0 (compatible; Indy Library)|Mozilla/3.0 (compatible; scan4mail (advanced version) http://www.peterspages.net/?scan4mail)|Mozilla/4.0 (compatible; Advanced Email Extractor v2.xx)|Mozilla/4.0 (compatible; Iplexx Spider/1.0 http://www.iplexx.at)|Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent|Mozilla/4.0 efp@gmx.net|Mozilla/5.0 (Version: xxxx Type:xx)|NameOfAgent (CMS Spider)|NASA Search 1.0|Nsauditor/1.x|PBrowse 1.4b|PEval 1.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1.0.2|PSurf15a 11|PSurf15a 51|PSurf15a VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot admin@google.com|ShablastBot 1.0|snap.com beta crawler v0|Snapbot/1.0|Snapbot/1.0 (Snap Shots, +http://www.snap.com)|sogou develop spider|Sogou Orion spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)|sogou spider|Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)|sohu agent|SSurf15a 11 |TSurf15a 11|Under the Rainbow 2.2|User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)|VadixBot|WebVulnCrawl.unknown/1.0 libwww-perl/5.803|Wells Search II|WEP Search 00
failregex = ^<HOST> -.*"(GET|POST|HEAD).*HTTP.*"(?:%(badbots)s|%(badbotscustom)s)"$
ignoreregex =

The variable “badbotscustom” is an option to improve clarity. You can add further user agent strings there if required.

The “failregex” line inserts the variables into a regular expression that generates the filter. Fail2Ban will search for all log entries that contain one of the badbots or badbotscustom strings as a user agent.

Test against a real log file whether the new rules work:

# fail2ban-regex /var/www/vhosts/<your domain>/logs/access_ssl_log /etc/fail2ban/filter.d/apache-badbots.conf

In the output you’ll see a line mentioning the number of matches:

Lines: 54 lines, 0 ignored, 1 matched, 53 missed

If your log has user agent strings that match your bad bot list, you’ll see at least one match. Is everything working as expected? Let Fail2Ban load the new version of the filter:

# fail2ban-client reload plesk-apache-badbots

How to Combat Hacker Scans – Vol I: Beginner Level

The plesk-apache-badbots jail is not only suitable for detecting bad bots. With an extension, you can also use it to detect hacker scans.

Hackers test websites not just for the presence of a few files, but for many. Although they may come across files that are actually present and thus receive the response code 200 from the web server, the attackers will also request files that are missing. The web server responds to such requests with a 301-permanent-redirect or 404-not-found. We take advantage of this to recognize the scan. More on this below.

Hackers also like to scan for common file names of authentication systems, which we do not use in our websites. For example, if a website does not run in an AWS instance and does not use any special software from AWS, we can safely evaluate all calls to such files that are aimed at obtaining sensitive information from a website as an attack and thus ban the attacker. We become the trapper by editing the file /etc/fail2ban/filter.d/apache-badbots.conf again. There we add further filter lines after the existing “failregex = …” line. A professional system administrator would prefer creating a separate filter file and a new jail for these extra lines: 1) Modifying the existing file can prevent the packaging system (YUM/DNF or APT) from updating it. 2) If a packaging system updates it, it might overwrite the file and modifications will be lost. For the proof of concept how to ban specific hacker attacks and to not make this blog post longer than what it already is, editing the existing filter will do. If you prefer creating a separate filter and jail, search engines are your best friends to find dozens of manuals that explain how to do that.

Since attackers don’t know what kind of websites they are dealing with, they have to try a few things. If you know your own website well, you can formulate rules that include paths that are typical for security information but are not found in your website. A regular visitor will never retrieve such paths, but an attacker requesting these paths will be recognized immediately.

In the following example, we assume that there are no “aws” folders or files in your websites, nor is oauth used. This is the case for most websites. We add the following filter lines:

^<HOST> .*GET .*aws(/|_|-)(credentials|secrets|keys).*
^<HOST> .*GET .*credentials/aws.*
^<HOST> .*GET .*secrets/(aws|keys).*
^<HOST> .*GET .*oauth/config.*
^<HOST> .*GET .*config/oauth.*

Remember to check the regular expressions of the filter after changing the /etc/fail2ban/filter.d/apache-badbots.conf file:

# fail2ban-regex /var/www/vhosts/<your domain>/logs/access_ssl_log /etc/fail2ban/filter.d/apache-badbots.conf

and then restart the jail so that the new filters take effect:

# fail2ban-client reload plesk-apache-badbots

The rules ensure that all GET queries to files and paths with the names oauth/config, config/oauth and aws-credentials, aws/credentials, aws_secrets etc. are considered an attack. If you actually use these paths in your websites, you must of course not use the rules. In the vast majority of cases, however, they can be used without hesitation.

You can build similar rules for other widely used software that you do not use yourself. For example, if you know that you do not use “Prestashop” on your servers, you could use a filter to recognize common folder names for Prestashop such as “travis-scripts”, “tests-legacy”, which are uncommon for other applications. Someone who tests your websites and tries to access these paths will be recognized as an attacker:

^<HOST> .*GET .*(travis-scripts|tests-legacy)/.*

These are just examples of how you can better detect and automatically stop attacks in day-to-day server operation. Use these examples as inspiration to write your own filters to suit your environment.

How To Combat Hacker Scans – Vol II: Sleight Of Hand Level

Finally, we have a special trick for you that is very effective in detecting hacker scans:

^<HOST> .*"GET .*(freshio|woocommerce).*frontend.*" (301|404).*
^<HOST> .*"GET .*contact-form-7/includes.*" (301|404).*
^<HOST> .*"(GET|POST) .*author=.*" 404.*

In the first two lines of this example we take advantage of the fact that popular software is included in hacker scans, e.g. “Woocommerce” or “Contact Form 7”. However, since an attacker does not know this before his tests, he not only searches for exactly this software, but also for other applications, e.g. “Freshio”. Regular website visitors will never call up a plugin path that does not exist on a website. But attackers are forced to try a lot and are very likely to come across a file that is not there.

This is exactly the case we are interested in, because it recognizes that the requester is not a regular visitor, but an attacker. This is why the first two filters mentioned as examples check the response code of the web server and filter the lines in which code 301 or 404 is returned. As a result, Fail2Ban knows that it is an attack and can ban the attacker’s IP address.

If your websites actually use Woocommerce, Freshio, Contact Form 7, nothing happens. But the attacker who provokes a 301 or 404 response is recognized by the filter.

In the third line of this example we are banning all users who are performing author scans on websites. If an author ID is missing, the website returns a 404-not-found error. So if someone retrieves a missing number, Fail2Ban recognizes the 404 error code and thus the attacker. Some of these scans are also carried out against websites that are not running WordPress. All the better for us, because these also result in a 404 error and thus provide us with the knowledge that it is a hacker attack.

In live tests with a five-digit number of websites, these methods were able to detect almost all hacker scans. They contribute much to reducing the CPU load of a server when many websites are running on the same host. This is because once an attacker IP has been blocked, further attacks against other websites on the same host are also prevented.

Conclusion

When a server receives many requests, it can get bogged down with high CPU usage, slowing down website responses. Thankfully, the Plesk Cgroups component steps in to regulate this load, offering a very effective solution.

With the Fail2Ban apache-badbots jail improvements demonstrated here, you can automatically fend off unwanted requests from “bad bots” and hackers. This is more effective than trying to block bad bots and hackers manually.

As a result, your host computer is less busy processing pointless requests, reducing the overall CPU load. This means faster responses for regular visitors since the processor has more room to handle computing processes right away.