Re: [monit] Re: monit race conditions on Mac OS X 10.5 Leopard?

On 1/24/08, Martin Pala <address@hidden> wrote:

This seems strange. Monit alerts are generated on each action
according to the configuration.

Can you run monit in verbose mode (-v option)  and send the log?

It is possible that the mailserver rejected the messages or you have
set the alert filter in the monit configuration to suppress particular
alerts?

By default monit will drop the email notification on mailserver error.
There is also support for events queue which allows to retry the
message delivery next cycle - to enable it us:

--8<--
  set eventqueue
      basedir /var/monit  # set the base directory where events will
be stored
      slots 100           # optionaly limit the queue size
--8<--

Anyway - the verbose mode will reveal what happens with the alert
messages and whether event queue is needed because of mailserver
problems.

Thanks,
Martin

On Jan 19, 2008, at 12:06 PM, Sergio Trejo wrote:

> This is an update to my previous message posted herein. The version
> 4.10.1 of monit most definitely has a bug in it and its not related
> to Mac OS X 10.5 because version 4.9 of monit runs just perfectly on
> Mac OS X 10.5.
>
> The bug is that monit 4.10.1 does not send out multiple email
> messages when, very cycle, it encounters multiple daemons not
> running (whether the daemons have crashed or have been torn down
> intentionally by a sys admin).
>
> Regards,
>
> Sergio
>
> On 1/19/08, Sergio Trejo <address@hidden> wrote: Hello,
>
> I have monit (version 4.10.1) running on an Apple machine which is
> Mac OS X Server (Leopard, 10.5.1). My installation of monit monitors
> six separate daemons for these programs: Apache, Postfix,
> PostgreSQL, Tomcat, OpenLDAP, and MySQL. My monit configuration file
> has entries that look like this for all of the six aforementioned
> programs (taking Apache for example):
>
> check process apache with pidfile "/opt/local/apache2/logs/
> httpd.pid" every 10 cycles
>     start = "/opt/local/apache2/bin/apachectl start"
>     stop = "/opt/local/apache2/bin/apachectl stop"
>     if failed port 80 and protocol http then restart
>     if 5 restarts within 5 cycles then timeout
>
> Where my daemon frequency is set to 60 seconds as in:
>
> set daemon 60
>
> What is interesting is that I had all six of my daemons running as a
> starting point and monit confirmed this (using the little http
> server built into monit on port 2812). I then, very intentionally
> (as sort of an auditing process) killed five out of my six daemons
> (the only daemon I left running was the Postfix daemon because I
> still wanted to have monit be capable of sending email alerts since
> I use the internal mail server running on the same machine as
> Postfix, as in "set mailserver 127.0.0.1"). So, with five of the six
> daemons intentionally killed, monit did successfully later catch up
> and successfully re-started all five daemons. However, monit only
> generated two mail message alerts:1
>
> 1. A message stating that the apache daemon did not exist
>
> 2. A message stating that the postgres daemon did exist (seemed to
> have sent this message after re-starting PostgreSQL)
>
> But, why didn't I receive ten messages, five of them for each daemon
> that I intentionally killed stating that they did not exist, and
> then later on five more messages stating that the five daemons
> (after being restarted) did indeed exist again?
>
> Also, why did I get the first message for apache saying it didn't
> exist whereas the second message, should it also have stated that
> the apache daemon existed again (instead of telling me that the
> postgres daemon existed)?
>
> It doesn't make sense. Is it possible that monit was "overwhelmed"
> or overloaded in some way and became "confused"? I know that doesn't
> sound appropriate for a binary system but there is nothing in the
> monit log file to give me any hints. Perhaps, did monit experience a
> race condition?
>
> The log file shows that all five daemons which I had manually killed
> were restarted successfully (and indeed they were -- I ssh'ed into
> my server and saw them all running again as processes and monit also
> reported their successful running again on its http server on port
> 2812).
>
> If this was a race condition, could there be an issue with
> threading? Mac OS X 10.5 (Leopard and Leopard Server) might be
> different enough compared to previous versions of Mac OS X with
> regard to a change to how threading works (but I am writing this
> very vaguely without much information at the moment other than some
> fuzzy recollection that something related to threading on Leopard
> might have changed).
>
> Thanks for any suggestions,
>
> Serg
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

From:	Sergio Trejo
Subject:	Re: [monit] Re: monit race conditions on Mac OS X 10.5 Leopard?
Date:	Sun, 27 Jan 2008 15:36:28 -1000