How to monitor Kolab processes

I have been using self hosted Kolab Groupware everyday for quite a while now. Therefore the need arose to monitor process activity and system resources using Monit utility.

Table of contents

Couple of words about monit

monit is a simple and robust utility for monitoring and automatic maintenance, which is supported on Linux, BSD and OS X.

Software installation

Debian Wheezy currently provides Monit 5.4.

To install it execute command:

$ sudo apt-get install monit

Monit daemon will be started at the boot time. Alternatively you can use standard System V init scripts to manage service.

Initial configuration

Configuration files are located under /etc/monit/ directory. Default settings are stored in the /etc/monit/monitrc file, which I strongly suggest to read. Custom configuration will be stored in the/etc/monit/conf.d/ directory.

I will override several important settings using local.conf file.

Modified settings

  • Set email address to root@example.org
  • Slightly change default template
  • Define mail server as localhost
  • Set default interval to 120 seconds with initial delay of 180 seconds
  • Enable local web server to take advantage of the additional functionality
    (currently commented out)
$ sudo cat /etc/monit/conf.d/local.conf
# define e-mail recipent
set alert root@example.org

# define e-mail template
set mail-format {
from: monit@$HOST
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date:        $DATE
Action:      $ACTION
Host:        $HOST
Description: $DESCRIPTION
}

# define server
set mailserver localhost

# define interval and initial delay
set daemon 120 with start delay 180

# set web server for local management
# set httpd port 2812 and use the address localhost allow localhost
Please take a note that enabling built-in web-server in the way I used above will allow every local user to access and perform monit operations. Essentially it should be disabled or secured using username and password combination.

Command-line operations

Verify configuration syntax

To check configuration syntax execute the following command.

$ sudo monit -t
Control file syntax OK

Start, Stop, Restart actions

Start all services and enable monitoring for them.

$ sudo monit start all

Start all services in resources group and enable monitoring for them.

$ sudo monit -g resources start

Start rootfs service and enable monitoring for it.

$ sudo monit start rootfs

You can initiate stop action in the same way as the above one, which will stop service and disable monitoring, or just execute restart action to stop and start corresponding services.

Monitor and unmonitor actions

Monitor all services.

$ sudo monit monitor all

Monitor all services in resources group.

$ sudo monit -g resources monitor

Monitor rootfs service.

$ sudo monit monitor rootfs

Use unmonitor action to disable monitoring for corresponding services.

Status action

Print service status.

$ sudo monit status
The Monit daemon 5.6 uptime: 27d 0h 47m

System 'server'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.26] [0.43] [0.48]
  cpu                               12.8%us 2.6%sy 0.0%wa
  memory usage                      2934772 kB [36.4%]
  swap usage                        2897376 kB [35.0%]
  data collected                    Mon, 29 Sep 2014 22:47:49

Filesystem 'rootfs'
  status                            Accessible
  monitoring status                 Monitored
  permission                        660
  uid                               0
  gid                               6
  filesystem flags                  0x1000
  block size                        4096 B
  blocks total                      17161862 [67038.5 MB]
  blocks free for non superuser     7327797 [28624.2 MB] [42.7%]
  blocks free total                 8205352 [32052.2 MB] [47.8%]
  inodes total                      4374528
  inodes free                       4151728 [94.9%]
  data collected                    Mon, 29 Sep 2014 22:47:49

Summary action

Print short service summary.

$ sudo monit summary
The Monit daemon 5.6 uptime: 27d 0h 48m

System 'server'                     Running
Filesystem 'rootfs'                 Accessible

Reload action

Reload configuration and reinitialize Monit daemon.

$ sudo monit reload

Quit action

Terminate Monit daemon.

$ sudo monit quit
monit daemon with pid [5248] killed

Monitor filesystems

Configuration syntax is very consistent and easy to grasp. I will start with simple example and then proceed to a slightly more complex ideas. Just remember to check one thing at a time.

I am using VPS service due to easy backup/restore process, so I have only one filesystem on /dev/root device, which I will monitor as a named rootfs service.

Monit daemon will generate alert and send an email if space or inode usage on the rootfs filesystem [stored on /dev/root device] exceeds 80 percent of the available capacity.

$ sudo cat /etc/monit/conf.d/filesystems.conf
check filesystem rootfs with path /dev/root
  group resources

  if space usage > 80% then alert
  if inode usage > 80% then alert

The above service is placed in resources group for easier management.

Monitor system resources

The following configuration will be stored as a named server service as it describes resource usage for the whole mail server.

Monit daemon will check memory usage, if it exceeds 80% of the available capacity for three subsequent events, it will send an alert email. Recovery message will be sent after two subsequent events to limit number of sent messages. The same rules apply to the remaining system resources.

The system I am using have four available processors, so the alert will be generated after the five minutes load average exceeds five.

$ sudo cat /etc/monit/conf.d/resources.conf
check system server
  group resources

  if memory usage > 80% for 3 cycles then alert
  else if succeeded for 2 cycles then alert

  if swap usage > 50% for 3 cycles then alert
  else if succeeded for 2 cycles then alert

  if cpu(wait) > 30% for 3 cycles then alert
  else if succeeded for 2 cycles then alert

  if cpu(system) > 60% for 3 cycles then alert
  else if succeeded for 2 cycles then alert

  if cpu(user) > 60% for 3 cycles then alert
  else if succeeded for 2 cycles then alert

  if loadavg(5min) > 5 then alert
  else if succeeded for 2 cycles then alert

The above service is placed in resources group for easier management.

Monitor system services

cron

cron is a daemon used to execute user-specified tasks at scheduled time.

Monit daemon will use the specified pid file [/var/run/crond.pid] to monitor [cron] service and restart it if it stops for any reason. Configuration change will generate alert message, permission issue will generate alert message and disable further monitoring.

GID of 102 translates to crontab group.

$ sudo cat /etc/monit/conf.d/cron.conf
check process cron with pidfile /var/run/crond.pid
  group system
  group scheduled-tasks

  start program = "/usr/sbin/service cron start"
  stop  program = "/usr/sbin/service cron stop"

  if 3 restarts within 5 cycles then timeout

  depends on cron_bin
  depends on cron_rc
  depends on cron_rc.d
  depends on cron_rc.daily
  depends on cron_rc.hourly
  depends on cron_rc.monthly
  depends on cron_rc.weekly
  depends on cron_rc.spool

  check file cron_bin with path /usr/sbin/cron
    group scheduled-tasks
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file cron_rc with path /etc/crontab
    group scheduled-tasks
    if failed checksum       then alert
    if failed permission 644 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory cron_rc.d with path /etc/cron.d
    group scheduled-tasks
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory cron_rc.daily with path /etc/cron.daily
    group scheduled-tasks
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory cron_rc.hourly with path /etc/cron.hourly
    group scheduled-tasks
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory cron_rc.monthly with path /etc/cron.monthly
    group scheduled-tasks
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory cron_rc.weekly with path /etc/cron.weekly
    group scheduled-tasks
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory cron_rc.spool with path /var/spool/cron/crontabs
    group scheduled-tasks
    if changed timestamp      then alert
    if failed permission 1730 then unmonitor
    if failed uid root        then unmonitor
    if failed gid 102         then unmonitor

The above service is placed in system and scheduled-tasks groups for easier management.

rsyslogd

rsyslogd is a message logging service.

$ sudo cat /etc/monit/conf.d/rsyslogd.conf
check process rsyslog with pidfile /var/run/rsyslogd.pid
  group system
  group logging

  start program = "/usr/sbin/service rsyslog start"
  stop  program = "/usr/sbin/service rsyslog stop"

  if 3 restarts within 5 cycles then timeout

  depends on rsyslog_bin
  depends on rsyslog_rc
  depends on rsyslog_rc.d

  check file rsyslog_bin with path /usr/sbin/rsyslogd
    group logging
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file rsyslog_rc with path /etc/rsyslog.conf
    group logging
    if failed checksum       then alert
    if failed permission 644 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory rsyslog_rc.d with path /etc/rsyslog.d
    group logging
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in system and logging groups for easier management.

ntpd

Network Time Protocol daemon will be extended by the use of port monitoring.

$ sudo cat /etc/monit/conf.d/ntpd.conf
check process ntp with pidfile /var/run/ntpd.pid
  group system
  group time

  start program = "/usr/sbin/service ntp start"
  stop  program = "/usr/sbin/service ntp stop"

  if failed port 123 type udp then restart

  if 3 restarts within 5 cycles then timeout

  depends on ntp_bin
  depends on ntp_rc

  check file ntp_bin with path /usr/sbin/ntpd
    group time
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file ntp_rc with path /etc/ntp.conf
    group time
    if failed checksum       then alert
    if failed permission 644 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in system and time groups for easier management.

OpenSSH

OpenSSH service will be extended by the use of match statement to test content of the configuration file. I assume it is self explanatory.

$ sudo cat /etc/monit/conf.d/openssh-server.conf
check process openssh with pidfile /var/run/sshd.pid
  group system
  group sshd

  start program = "/usr/sbin/service ssh start"
  stop  program = "/usr/sbin/service ssh stop"

  if failed port 22 with proto ssh then restart

  if 3 restarts with 5 cycles then timeout

  depend on openssh_bin
  depend on openssh_sftp_bin
  depend on openssh_rsa_key
  depend on openssh_dsa_key
  depend on openssh_rc

  check file openssh_bin with path /usr/sbin/sshd
    group sshd
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file openssh_sftp_bin with path /usr/lib/openssh/sftp-server
    group sshd
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file openssh_rsa_key with path /etc/ssh/ssh_host_rsa_key
    group sshd
    if failed checksum       then unmonitor
    if failed permission 600 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file openssh_dsa_key with path /etc/ssh/ssh_host_dsa_key
    group sshd
    if failed checksum       then unmonitor
    if failed permission 600 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file openssh_rc with path /etc/ssh/sshd_config
    group sshd
    if failed checksum       then alert
    if failed permission 644 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

    if not match "^PasswordAuthentication no" then alert
    if not match "^PubkeyAuthentication yes"  then alert
    if not match "^PermitRootLogin no"        then alert

The above service is placed in system and sshd groups for easier management.

Monitor Kolab services

MySQL

MySQL is an open-source database server used by the wide range of Kolab services.

UID of 106 translates to mysql user. GID of 106 translates to mysql group.

It is the first time I have used unixsocket statement here.

$ sudo cat /etc/monit/conf.d/mysql.conf
check process mysql with pidfile /var/run/mysqld/mysqld.pid
  group kolab
  group database

  start program = "/usr/sbin/service mysql start"
  stop  program = "/usr/sbin/service mysql stop"

  if failed port 3306 protocol mysql then restart
  if failed unixsocket /var/run/mysqld/mysqld.sock protocol mysql then restart

  if 3 restarts within 5 cycles then timeout

  depends on mysql_bin
  depends on mysql_rc
  depends on mysql_sys_maint
  depend  on mysql_data

  check file mysql_bin with path /usr/sbin/mysqld
    group database
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file mysql_rc with path /etc/mysql/my.cnf
    group database
    if failed checksum       then alert
    if failed permission 644 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file mysql_sys_maint with path /etc/mysql/debian.cnf
    group database
    if failed checksum       then unmonitor
    if failed permission 600 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory mysql_data with path /var/lib/mysql
    group database
    if failed permission 700 then unmonitor
    if failed uid 106        then unmonitor
    if failed gid 110        then unmonitor

The above service is placed in kolab and database groups for easier management.

Apache

Apache is an open-source HTTP server used to serve user/admin web-interface.

Please notice that I am checking HTTPS port.

$ sudo cat /etc/monit/conf.d/apache.conf
check process apache with pidfile  /var/run/apache2.pid
  group kolab
  group web-server

  start program = "/usr/sbin/service apache2 start"
  stop  program = "/usr/sbin/service apache2 stop"

  if failed port 443 then restart

  if 3 restarts within 5 cycles then timeout

  depends on apache2_bin
  depends on apache2_rc
  depends on apache2_rc_mods
  depends on apache2_rc_sites

  check file apache2_bin with path /usr/sbin/apache2.prefork
    group web-server
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory apache2_rc with path /etc/apache2
    group web-server
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory apache2_rc_mods with path /etc/apache2/mods-enabled
    group web-server
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory apache2_rc_sites with path /etc/apache2/sites-enabled
    group web-server
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in kolab and web-server groups for easier management.

Kolab daemon

This is the heart of the whole Kolab unified communication and collaboration system as it is responsible for data synchronization between different services.

UID of 413 translates to kolab-n user. GID of 412 translates to kolab group.

$ sudo cat /etc/monit/conf.d/kolab-server.conf
check process kolab-server with pidfile /var/run/kolabd/kolabd.pid
  group kolab
  group kolab-daemon

  start program = "/usr/sbin/service kolab-server start"
  stop  program = "/usr/sbin/service kolab-server stop"

  if 3 restarts within 5 cycles then timeout

  depends on kolab-daemon_bin
  depends on kolab-daemon_rc

  check file kolab-daemon_bin with path /usr/sbin/kolabd
    group kolab-daemon
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file kolab-daemon_rc with path /etc/kolab/kolab.conf
    group kolab-daemon
    if failed checksum       then alert
    if failed permission 640 then unmonitor
    if failed uid 413        then unmonitor
    if failed gid 412        then unmonitor

The above service is placed in kolab and kolab-daemon groups for easier management.

Kolab saslauthd

Kolab saslauthd is the SASL authentication daemon for multi-domain Kolab deployments.

$ sudo cat /etc/monit/conf.d/kolab-saslauthd.conf
check process kolab-saslauthd with pidfile /var/run/kolab-saslauthd/kolab-saslauthd.pid
  group kolab
  group kolab-saslauthd

  start program = "/usr/sbin/service kolab-saslauthd start"
  stop  program = "/usr/sbin/service kolab-saslauthd stop"

  if 3 restarts within 5 cycles then timeout

  depends on kolab-saslauthd_bin

  check file kolab-saslauthd_bin with path /usr/sbin/kolab-saslauthd
    group kolab-saslauthd
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in kolab and kolab-saslauthd groups for easier management.

It can be tempting to monitor /var/run/saslauthd/mux socket, but just leave it alone for now.

Wallace

The Wallace is a content filtering daemon.

$ sudo cat /etc/monit/conf.d/wallace.conf
check process wallace with pidfile /var/run/wallaced/wallaced.pid
  group kolab
  group wallace

  start program = "/usr/sbin/service wallace start"
  stop  program = "/usr/sbin/service wallace stop"

  #if failed port 10026 then restart

  if 3 restarts within 5 cycles then timeout

  depends on wallace_bin

  check file wallace_bin with path /usr/sbin/wallaced
    group wallace
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in kolab and wallace groups for easier management.

ClamAV

The ClamAV daemon is an open-source, cross-platform antivirus software.

$ sudo cat /etc/monit/conf.d/clamav.conf
check process clamav with pidfile /var/run/clamav/clamd.pid
  group system
  group antivirus

  start program = "/usr/sbin/service clamav-daemon start"
  stop  program = "/usr/sbin/service clamav-daemon stop"

  if 3 restarts within 5 cycles then timeout

  #if failed unixsocket /var/run/clamav/clamd.ctl type udp then alert

  depends on clamav_bin
  depends on clamav_rc

  check file clamav_bin with path /usr/sbin/clamd
    group antivirus
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file clamav_rc with path /etc/clamav/clamd.conf
    group antivirus
    if failed permission 644 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in kolab and antivirus groups for easier management.

Freshclam

Freshclam is a software used to periodically update ClamAV virus databases.

$ sudo cat /etc/monit/conf.d/freshclam.conf
check process freshclam with pidfile /var/run/clamav/freshclam.pid
  group system
  group antivirus-updater

  start program = "/usr/sbin/service clamav-freshclam start"
  stop  program = "/usr/sbin/service clamav-freshclam stop"

  if 3 restarts within 5 cycles then timeout

  depends on freshclam_bin
  depends on freshclam_rc

  check file freshclam_bin with path /usr/bin/freshclam
    group antivirus-updater
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file freshclam_rc with path /etc/clamav/freshclam.conf
    group antivirus-updater
    if failed permission 444 then unmonitor
    if failed uid 110        then unmonitor
    if failed gid 4          then unmonitor

The above service is placed in kolab and antivirus-updater groups for easier management.

amavisd-new

Amavis is a high-performance interface between Postfix mail server and content filtering services: SpamAssassin as a spam classifier and ClamAV as an antivirus protection.

$ sudo cat /etc/monit/conf.d/amavisd-new.conf
check process amavisd-new with pidfile /var/run/amavis/amavisd.pid
  group kolab
  group content-filter

  start program = "/usr/sbin/service amavis start"
  stop  program = "/usr/sbin/service amavis stop"

  if 3 restarts within 5 cycles then timeout

  #if failed port 10024 type tcp then restart
  #if failed unixsocket /var/lib/amavis/amavisd.sock type udp then alert

  depends on amavisd-new_bin
  depends on amavisd-new_rc

  check file amavisd-new_bin with path /usr/sbin/amavisd-new
    group content-filter
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory amavisd-new_rc with path /etc/amavis/
    group content-filter
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in kolab and content-filter groups for easier management.

The main Directory Server daemon

The main Directory Server daemon is a 389 LDAP Directory Server.

$ sudo cat /etc/monit/conf.d/dirsrv.conf
check process dirsrv with pidfile  /var/run/dirsrv/slapd-xmail.stats
  group kolab
  group dirsrv

  start program = "/usr/sbin/service dirsrv start"
  stop  program = "/usr/sbin/service dirsrv stop"

  if 3 restarts within 5 cycles then timeout

  if failed port 389 type tcp then restart

  depends on dirsrv_bin
  depends on dirsrv_rc

  check file dirsrv_bin with path /usr/sbin/ns-slapd
    group dirsrv
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory dirsrv_rc with path /etc/dirsrv/
    group dirsrv
    if changed timestamp     then alert

The above service is placed in kolab and dirsrv groups for easier management.

SpamAssasin

SpamAssasin is a content filter used for spam filtering.

$ sudo cat /etc/monit/conf.d/spamd.conf
check process spamd with pidfile /var/run/spamd.pid
  group system
  group spamd

  start program = "/usr/sbin/service spamassassin start"
  stop  program = "/usr/sbin/service spamassassin stop"

  if 3 restarts within 5 cycles then timeout

  #if failed port 783 type tcp then restart

  depends on spamd_bin
  depends on spamd_rc

  check file spamd_bin with path /usr/sbin/spamd
    group spamd
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory spamd_rc with path /etc/spamassassin/
    group spamd
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in kolab and spamd groups for easier management.

Cyrus IMAP/POP3 daemons

cyrus-imapd daemon is responsible for IMAP/POP3 communication.

$ sudo cat /etc/monit/conf.d/cyrus-imapd.conf
check process cyrus-imapd with pidfile  /var/run/cyrus-master.pid
  group kolab
  group cyrus-imapd

  start program = "/usr/sbin/service cyrus-imapd start"
  stop  program = "/usr/sbin/service cyrus-imapd stop"

  if 3 restarts within 5 cycles then timeout

  if failed port 143 type tcp then restart
  if failed port 4190 type tcp then restart
  if failed port 993 type tcp then restart

  depends on cyrus-imapd_bin
  depends on cyrus-imapd_rc

  check file cyrus-imapd_bin with path /usr/lib/cyrus-imapd/cyrus-master
    group cyrus-imapd
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check file freshclam_rc with path /etc/cyrus.conf
    group anti-virus
    if failed checksum       then alert
    if failed permission 644 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in kolab and cyrus-imapd groups for easier management.

Postfix

Postfix is an open-source mail transfer agent used to route and deliver electronic mail.

$ sudo cat /etc/monit/conf.d/postfix.conf
check process postfix with pidfile /var/run/cyrus-master.pid
  group kolab
  group mta

  start program = "/usr/sbin/service postfix start"
  stop program = "/usr/sbin/service postfix stop"

  if 3 restarts within 5 cycles then timeout

  if failed port 25 type tcp then restart
  #if failed port 10025 type tcp then restart
  #if failed port 10027 type tcp then restart
  if failed port 587 type tcp then restart

  depends on postfix_bin
  depends on postfix_rc

  check file postfix_bin with path /usr/lib/postfix/master
    group mta
    if failed checksum       then unmonitor
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

  check directory postfix_rc with path /etc/postfix/
    group mta
    if changed timestamp     then alert
    if failed permission 755 then unmonitor
    if failed uid root       then unmonitor
    if failed gid root       then unmonitor

The above service is placed in kolab and mta groups for easier management.

Ending notes

This blog post is definitely too long, so I will just mention that similar configuration can be used to monitor other integrated solutions like ISPConfig, or custom specialized setups.

In my opinion Monit is a great utility which simplifies system and service monitoring. Additionally it provides interesting proactive features, like service restart, or arbitrary program execution on selected tests.

Everything is described in the manual page.

$ man monit
Milosz Galazka's Picture

About Milosz Galazka

Milosz is a Linux Foundation Certified Engineer working for a successful Polish company as a system administrator and a long time supporter of Free Software Foundation and Debian operating system.

Gdansk, Poland https://sleeplessbeastie.eu