Analysis of failed remote login attempts

A few months ago, I configured a Mageia 2.0 box with a static and public IP address. I was not sure of its purpose except perhaps as a way to access large files (pictures and videos of friends and family) that I did not want to keep on my domain web host machine (for space reasons). So the base install on this system consisted of an SSH and a HTTP server. Incidentally, this machine is behind a firewall appliance. So I did not configure any additional security except making sure that the machine's firewall was running too. Only the ports 22 (SSH) and 80 (WWW) were allowed by the machine's firewall.

Fast forward two months. I was debugging a PHP script when I noticed in the system logs that there were attempts to access web pages that did not exist. I then decided to check the SSH logs and I noticed failed login attempts as well. So I decided I would take a closer look at some later point. Well, today I got that chance and what follows is an analysis of the logs of the SSH service. This is not a very detailed analysis. Just something to satisfy my curiosity.

Background for the analysis

We begin with the failed attempts to remotely log into my machine with SSH. We will try to answer three questions:

  1. The usernames that are commonly attacked
  2. The number of attempts made for each username
  3. Identity of the attackers (i.e. their IP addresses) and their persistence.
We will use common Unix command line utilities: grep , cut , sort and uniq . The first step is to extract the lines that reported a failed attempt from the system logs. A simple grep command is enough.
  # grep "Failed password" /var/log/auth.log* > ~ak/ssh-failed-password-logins.txt
Since the observations should be presented in the context of the time period during which these attempts were made, we checked the logs to see the date and time of the first failed attempt and the last failed attempt.
  [ak@bobcat ~]$ cut -d':' -f2 ssh-failed-password-logins.txt | cut -d' ' -f1,2 | sort -k1M -k2n | uniq

The results showed that the period under consideration is about 6 weeks. The first failed attempt in this dataset occured on 15 Sep 2013 and the latest was today i.e. 23 Oct 2013. Incidentally, these attempts were made only on 19 separate days (out of 39), i.e., on approximately 50% of the days. As I have mentioned before, the targeted machine is behind a firewall controlled by another entity and hence it is possible that more attempts were rebuffed by the firewall. This 50% therefore represents the minimum number of days on which attempts were made and were registered by the machine.

Analysis of the logs

  1. We begin with our first assessment: the usernames that are commonly attacked. Consider the output below.

      [ak@bobcat ~]$ cut -d' ' -f9 ssh-failed-password-logins.txt | sort | uniq -c
      2 apache
      98 bin
      1 daemon
      990 invalid
      2 mysql
      1 news
      1 openvpn
      1 operator
      2156 root
    

    Straightaway we observe, almost two-third of attempts were performed for the username root. The second most common attempt was made for the username bin. The third most common was for invalid. Turned out that when a user did not exist on the system, the log would write invalid user instead of the username and then list the attempted username. I show the results of these invalid username attempts next. Interestingly, there were two attempts as apache. Is that a coincidence or is it because the machine operator/script noticed that I have a web server running on my system? Another interesting point to note is that the attempts are made assuming a Unix/BSD system and not a Windows machine. Now what about those invalid usernames?

  2. Since the attacker does not know what usernames have been created on the system, they naturally attempt the most common. We already saw that root and bin were most common and which exist on the default Linux installation. Since my Linux box is not really used for any practical purpose, it just has one user account, mine. This is probably also true for most Linux installs which face the public Internet directly (to limit the attack vector). However, many system administrators are rather uninformed when it comes to securing systems. Therefore, an uninformed attacker will likely try to attack a system using most commonly used usernames. What are they? Consider the output below.

    [ak@bobcat ~]$ grep invalid ssh-failed-password-logins.txt| cut -d' ' -f11 | sort | uniq -c | sort -nk1
    ...
    4 postgres
    5 administrator
    5 adrian
    5 ivan
    5 shoutcast
    5 support
    6 backup
    6 hadoop
    6 user0
    6 zimbra
    8 tomcat
    8 user
    9 guest
    9 minecraft
    9 webmaster
    10 admin
    17 test
    26 gateway
    32 nagios
    37 www
    38 userftp
    41 oracle
    58 deploy
    77 ftptest
    

    In this list I have only listed the most attempted usernames with the frequency of the attempts and ranked them in increasing order of the frequency. None of these users exist on my machine and so they got logged as invalid user. This list is probably culled from the /etc/passwd dumps of most public machines. Nevertheless, it is surprising that a user called ftptest is supposed to exist when clearly my system does not have FTP installed or enabled. Is this a shot in the dark? Or merely a less intelligent script? It is not clear why usernames such as deploy and test exist. The user minecraft is a complete surprise to me. There were some other product name based usernames on which attempts were made such as D-Link, asterisk, plesk, centos, honda, mysql and huawei among others. I did not list them above. Since this list has a long tail, I suspect that the attackers attempted most of these usernames with their default passwords and gave up soon after. I also suspect that it is quite unlikely that this strategy is followed by determined attackers. They must use a more focused approach rather than this spray and hope it sticks approach.

  3. The third item on my list was to understand who was trying to access my machine and with what level of persistence. I isolated the IP addresses from the logs and ranked them by the number of times each of these IP addresses attempted a login. Here's what got listed.

      [ak@bobcat ~]$ grep -v invalid ssh-failed-password-logins.txt | cut -d' ' -f11 | sort | uniq -c | sort -nk1
      ...
      3 115.146.123.189
      3 221.232.95.203
      3 5.61.27.7
      3 82.98.104.217
      5 118.122.17.170
      6 180.168.83.54
      6 180.96.71.228
      6 68.169.45.8
      7 183.60.102.4
      8 87.106.218.111
      9 159.226.115.221
      10 109.235.251.162
      12 202.137.9.177
      12 218.28.116.254
      12 66.84.25.66
      13 147.102.28.117
      13 180.153.88.246
      13 211.144.68.209
      13 219.141.213.76
      16 219.245.190.5
      18 121.199.31.130
      18 198.20.97.54
      18 54.213.72.90
      22 206.212.248.226
      28 111.73.46.210
      28 201.175.9.21
      28 218.27.190.133
      35 223.5.12.61
      36 64.31.19.99
      39 122.0.66.103
      39 91.232.208.38
      44 198.74.113.175
      50 109.203.96.248
      50 61.153.110.253
      51 222.187.126.134
      52 14.18.207.53
      56 221.229.252.179
      60 61.147.74.223
      68 117.41.187.152
      71 42.120.4.116
      77 221.176.53.74
      90 61.177.91.48
      141 119.147.137.27
      221 5.248.194.29
      321 223.4.180.23
      397 88.190.13.232
    

    As before, the first column lists the number of attemps made from a particular IP address. The second column list the IP address. I assume that there is a one-to-one mapping between the IP address and a physical machine. Observe that although there appears to be only a small number of machines on the Internet that are attacking my machine, the attack by the top three attackers is vociferous. Clearly, these persistent machines are controlled by people trying out the large number of potential usernames (and potential passwords for each of those potential usernames).

    I did not attempt to understand where these machines are located but perhaps I will do that another time. I think it is quite likely that these machines are controlled by some bot network and the owner of the machine is unaware of what her/his machine is doing. That begs investigation as well.

Next week, I will present a similar type of analysis for the attacks on the web server, namely the non-existent web pages being accessed and the strange URLs being requested from the web server.