[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] how to specify this tricky situation

From: Martin Pala
Subject: Re: [monit] how to specify this tricky situation
Date: Sat, 21 Jun 2008 12:01:12 +0200

On Jun 13, 2008, at 10:21 PM, David Blank-Edelman wrote:


I've been really happy with monit and appreciate all of the hard work put into it. I recently encountered a scenario for which I can't seem to find an elegant approach. I'm wondering if you have any suggestions about the following:

I have a piece of software I am trying to monitor which consists of two daemons which I'll call "spam" and "proxy".

I want to check that two things are always true:
 1) is the SMTP service on port 25 (provided by "proxy") available
 2) is the process "spam" running as per its pid file

That's all very easy to do using monit. Here is where it gets more interesting.

I have three vendor-supplied scripts available to me:
 restart-software (calls stop-software and then start-software)

start-software and stop-software always start and stop _both_ the daemons.

The place I am getting into trouble seems to be with the two stanzas of my config file fighting with each other. Let's say the "spam" process goes down. It attempts to restart, but this has the side effect of bringing down the proxy daemon and monit then attempts to correct the lack of SMTP service by, you guessed it, bringing down the spam process. And so on... The other part of this that I think is biting me is that at least part of the process is asynchronous and hence some of this corrective action is overlapping


yes, there was possible overlap in actions, this problem is addressed in monit-5.0 (currently beta):

Monit 5.0 will most probably solve he problem - it will wait for service to start before testing the next service.

Ideally I'd love to construct a single stanza that says (atomically) if either #1 or #2 is true, attempt a restart. I would think that I could use dependencies to help with this, but the problem is they are both (because of how they are started/stopped) dependent on each other. I also contemplated using the fact that both #1 and #2 could be put in the same group, but as far as I can tell groups aren't actually accessible from the config file (i.e. you can't say "restart group" from anything but the command line).

Since the restart script is common, you can use common service entry in monit, joining the smtp port and process check, something like:

  check process spam_proxy with pidfile /var/run/
    start program = ...
    stop program = ...
    if failed port 25 protocol smtp then restart

If one of the services will fail, monit will call restart script to recover the service.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]