Thursday, April 23, 2015

Configuring Monit: a free system monitoring and recovery tool

Why Monit?

One morning, I went on-line to check my WordPress website. Lo and behold, I saw this error: 'Error establishing a database connection.' My website had been down for 4 hours, luckily in the middle of the night.

I used a free website monitoring service called StatusCake. Sure enough, it did send me an email alerting me about this problem. But, sending an email at 2am was not helpful in solving the problem. What I really needed was a tool that not only detected when the database process went down, but would also restart the process without human intervention. Monit is such a tool.

For the rest of this post, I assume you want Monit to monitor a LAMP server (Linux, Apache2, MySQL, PHP).

Install Monit.

To install Monit on Debian or Ubuntu, execute this command:

$ sudo apt-get install monit

As part of the installation, a monit service is created:

$ sudo chkconfig --list | grep -i monit  
monit       0:off  1:off  2:on   3:on   4:on   5:on   6:off

Configure Monit

The main Monit configuration file is /etc/monit/monitrc. To edit it, you need sudo privileges.

$ sudo vi /etc/monit/monitrc

After you make a change to the file, follow these steps to bring it into effect:

  1. Validate configuration file syntax.

    $ sudo monit -t
    

    If no error is returned, proceed to next step.

  2. Restart Monit.

    $ sudo service monit restart
    

Global settings

The key global settings to customize are:

  • Test interval

    By default, Monit checks your system at 2-minute intervals. To customize the interval, change the value (from 120) in the following statement. The unit of measure is seconds.

    set daemon 120
    
  • Log file location

    You can specify whether Monit logs to syslog or a log file of your choice.

    # set logfile syslog facility log_daemon  
    set logfile /var/log/monit.log
    
  • Mail server

    Specify a mail server for Monit to send email alerts. I set up exim4 as an SMTP server on the localhost. For instructions, refer to my previous post.

    set mailserver localhost
    
  • Email format

    Hopefully, you won't receive many alert emails, but when you do, you want the maximum information about the potential problem. The default email format contains all the information known to Monit, but you may customize the format in which the information is delivered. To customize, use the set mail-format statement.

    set mail-format {  
        from:     monit@$HOST  
        subject:  monit alert --  $EVENT $SERVICE  
        message:  $EVENT Service $SERVICE  
                  at $DATE  
                  on $HOST 
                  $ACTION  
                  $DESCRIPTION  
                  Your faithful employee,  
                  Monit  
    }
    

    For a description of the set mail-format statement, click here.

  • Global alerts

    If any actionable event occurs, Monit sends an email alert to a predefined address list. Each email address is defined using the set alert statement.

    set alert root@localhost not on { instance, action }
    

    In the above example, root@localhost is the email recipient. Please refer to my earlier post about redirecting local emails to a remote email account.

    Note that an event filter is defined (not on { instance, action }). Root@local will receive an email alert on every event unless it is of the instance or action type. An instance event is triggered by the starting or stopping of the Monit process. An action event is triggered by certain explicit user commands, e.g., to unmonitor or monitor a service. Click here for the complete list of event types that you can use for filtering.

    By default, Monit sends an email alert when a service fails and another when it recovers. It does not repeat failure alerts after the initial detection. You can change this default behavior by specifying the reminder option in the set alert statement. The following example sends a reminder email on every fifth test cycle if the target service remains failed:

    set alert root@localhost with reminder on 5 cycles
    
  • Enabling reporting and service management

    You can dynamically manage Monit service monitors, and request status reports. These capabilities are delivered by an embedded web server. By default, this web server is disabled. To enable it, include the set httpd statement.

    set httpd port 2812 and
        use address localhost  
        allow localhost
    

    Note: I've only allowed local access to the embedded web server. The Useful Commands section below explains the commands to request reporting and management services.

Resource monitor settings

The following are the key resources to monitor on a LAMP server.

  • System performance

    You can configure Monit to send an alert when system resources are running below certain minimum performance threshold. The system resources that can be monitored are load averages, memory, swap and CPU usages.

    check system example.com   
      if loadavg (1min) > 4       then alert 
      if loadavg (5min) > 2       then alert 
      if memory usage   > 75%     then alert 
      if swap usage     > 25%     then alert 
      if cpu usage (user)   > 70% then alert 
      if cpu usage (system) > 30% then alert 
      if cpu usage (wait)   > 20% then alert
    
  • Filesystem usage

    You can create a monitor which is triggered when the percentage of disk space used is greater than an upper threshold.

    check filesystem rootfs with path /
      if space usage > 90% then alert
    

    You may have more than 1 filesystem created on your server. Run the df command to identify the filesystem name (rootfs) and the path it was mounted on (/).

  • MySQL

    Instead of putting the MySQL-specific statements in the main configuration file, I elect to put them in /etc/monit/conf.d/mysql.conf. This is a personal preference. I like a more compact main configuration file. All files inside the /etc/monit/conf.d/ directory are automatically included in Monit configuration.

    The following statements should be inserted into the mysql.conf file.

    check process mysql with pidfile /var/run/mysqld/mysqld.pid  
          start program = "/etc/init.d/mysql start"  
          stop program = "/etc/init.d/mysql stop"  
          if failed unixsocket /var/run/mysqld/mysqld.sock then restart  
          if 5 restarts within 5 cycles then timeout
    

    If the MySQL process dies, Monit needs to know how to restart it. The command to start the MySQL process is specified by the start program clause. The command to stop MySQL is specified by the stop command clause.

    A timeout event is triggered if MySQL is restarted 5 times in a span of 5 consecutive test cycles. In the event of a timeout, an alert email is sent, and the MySQL process will no longer be monitored. To resume monitoring, execute this command:

    $ sudo monit monitor mysql
    
  • Apache

    I put the following Apache-specific statements in the file /etc/monit/conf.d/apache.conf.

    check process apache2 with pidfile /var/run/apache2.pid
          start program = "/etc/init.d/apache2 start"
          stop program = "/etc/init.d/apache2 stop"
          if failed host example.com port 80 protocol http request "/monit/token" then restart
          if 3 restarts within 5 cycles then timeout
          if children > 250 then restart
          if loadavg(5min) greater than 10 for 8 cycles then stop
    

    At every test cycle, Monit attempts to retrieve http://example.com/monit/token. This URL points to a dummy file created on the webserver specifically for this test. You need to create the file by executing the following commands:

    $ mkdir /var/www/monit
    $ touch /var/www/monit/token 
    $ chown -R www-data:www-data /var/www/monit
    

    Besides testing web access, the above configuration also monitors resource usages. The Apache process is restarted if it spawns more than 250 child processes. Apache is also restarted if the server's load average is greater than 10 for 8 cycles.

Useful commands

To print a status summary of all services being monitored, execute the command below:

    $ sudo monit summary  
    The Monit daemon 5.4 uptime: 3h 48m 

    System 'example.com'                Running
    Filesystem 'rootfs'                 Accessible
    Process 'mysql'                     Running
    Process 'apache2'                   Running

To print detailed status information of all services being monitored, execute the following:

    $ sudo monit status
    The Monit daemon 5.4 uptime: 3h 52m 

    System 'example.com'
      status                            Running
      monitoring status                 Monitored
      load average                      [0.00] [0.01] [0.05]
      cpu                               0.0%us 0.0%sy 0.0%wa
      memory usage                      377092 kB [74.0%]
      swap usage                        53132 kB [10.3%]
      data collected                    Wed, 22 Apr 2015 13:21:47
    ...        
    Process 'apache2'
      status                            Running
      monitoring status                 Monitored
      pid                               12909
      parent pid                        1
      uptime                            6d 15h 18m 
      children                          10
      memory kilobytes                  2228
      memory kilobytes total            335420
      memory percent                    0.4%
      memory percent total              65.9%
      cpu percent                       0.0%
      cpu percent total                 0.0%
      port response time                0.001s to example.com:80/my-monit-dir/token [HTTP via TCP]
      data collected                    Wed, 22 Apr 2015 13:21:47

To unmonitor a particular service (e.g., apache2):

    $ sudo monit unmonitor apache2

To unmonitor all services:

    $ sudo monit unmonitor all

To monitor a service:

    $ sudo monit monitor apache2

To monitor all services:

    $ sudo monit monitor all

Conclusion

I'd recommend that you run Monit on your server in addition to signing up for a remote website monitoring service such as StatusCake. While the 2 services do overlap, they also complement each other. Monit runs locally on your server, and can restart processes when a problem is detected. However, a networking problem may go undetected by Monit. That is where a remote monitoring service shines. In the event of a network failure, the remote monitor fails to connect to your server, and will therefore report a problem that may otherwise go unnoticed.

No comments: