As I mentioned in an earlier e-mail, I'm trying to get
monit to watch a group of processes so it can start/stop
ospfd for an anycast high availability application. However,
in doing this I'm seeing some odd behaviour that doesn't
match what I expect -- is this a bug?
In the scenario below, why is it ever trying to start
ospfd? If apache is down, shouldn't ospfd stay down until
apache comes back up or is monitored again after being
unmonitored? It does end up in the correct state at the end,
but not without restarting and stopping ospfd twice in the
meantime.
As an example, I have the following configured:
check process apache with
pidfile /var/run/httpd.pid
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host localhost port 80 protocol http
and request "/" then restart
if 2 restarts within 2 cycles then stop
check process ospfd with pidfile /var/run/quagga/ospfd.pid
start program = "/etc/init.d/ospfd start"
stop program = "/etc/init.d/ospfd stop"
depends on apache
If I make it so that apache cannot run (by removing execute
permissions on /usr/sbin/httpd) and then kill it, I see the
following in the monit logs:
Dec 6 08:47:39 tecate
monit[9988]: 'apache' process is not running
Dec 6 08:47:39 tecate monit[9988]: 'apache' trying to
restart
Dec 6 08:47:39 tecate monit[9988]: 'ospfd' stop:
/etc/init.d/ospfd
Dec 6 08:47:39 tecate monit[9988]: 'apache' start:
/etc/init.d/httpd
Dec 6 08:47:40 tecate monit[9988]: 'ospfd' unmonitor on
user request
Dec 6 08:47:40 tecate monit[9988]: monit daemon at 9988
awakened
Dec 6 08:48:09 tecate monit[9988]: 'apache' failed to start
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' start:
/etc/init.d/ospfd
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' unmonitor action
done
Dec 6 08:48:09 tecate monit[9988]: Awakened by User defined
signal 1
Dec 6 08:48:09 tecate monit[9988]: 'apache' process is not
running
Dec 6 08:48:09 tecate monit[9988]: 'apache' trying to
restart
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' stop:
/etc/init.d/ospfd
Dec 6 08:48:09 tecate monit[9988]: 'apache' start:
/etc/init.d/httpd
Dec 6 08:48:09 tecate monit[9988]: 'ospfd' unmonitor on
user request
Dec 6 08:48:09 tecate monit[9988]: monit daemon at 9988
awakened
Dec 6 08:48:39 tecate monit[9988]: 'apache' failed to start
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' start:
/etc/init.d/ospfd
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' unmonitor action
done
Dec 6 08:48:39 tecate monit[9988]: Awakened by User defined
signal 1
Dec 6 08:48:39 tecate monit[9988]: 'apache' service
restarted 2 times within 2 cycles(s) - stop
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' stop:
/etc/init.d/ospfd
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' unmonitor on
user request
Dec 6 08:48:39 tecate monit[9988]: monit daemon at 9988
awakened
Dec 6 08:48:39 tecate monit[9988]: Awakened by User defined
signal 1
Dec 6 08:48:39 tecate monit[9988]: 'ospfd' unmonitor action
done
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general