Re: Check program problem

On Mon, Nov 19, 2012 at 5:43 PM, Jan-Henrik Haukeland <address@hidden> wrote:

I'm not sure I understand the problem, but that does not prevent me from having a suggestion :) I'm wondering if the every statement could help in this situation? As in:

check program with path '/tmp/script.sh'

every 2 cycles

if status != 0 then exec '/tmp/some_service.sh restart'

Any luck with that?

On Nov 19, 2012, at 12:12 PM, Dmitry Zamaruev <address@hidden> wrote:

> Hi,
>
> I'm using 'check program' to monitor thread leak in one of our applications. All is working nice, except that application is always restarted twice. I dig through source code and found that it should be related to how 'check program' is handled.
> Here is my configuration example:
>
> check program with path '/tmp/script.sh'
> if status != 0 then exec '/tmp/some_service.sh restart'
>
> Here is the workflow I'm seeing:
>
> - Poll period #1:
> - start /tmp/script.sh
>
> - Poll period #2:
> - collect exit code from /tmp/script.sh
> - raise event with status = 1
> - start /tmp/script.sh <<== problem here, script is run against service before restart! so it will return status=1
> - process event - exec '/tmp/some_service.sh restart'
>
> - Poll period #3
> - collect exit code from /tmp/script.sh
> - raise event with status = 1
> - start /tmp/script.sh <<== here script is run against fresh service after restart at step #2
> - process event - exec '/tmp/some_service.sh restart'
>
> - Poll period #4
> - collect exit code from /tmp/script.sh
> - exit status == 0, so all ok now
>
> If I try to use different condition, for example 'status == 1 for 2 cycles' - this event chain will be just longer, i.e. after two failures it will restart application, but because next poll cycle is also "failure" - three failed cycles, monit will still successfully match against 'status == 1 for 2 cycles'.
>
> Is there any way to workaround double restart (time for restart is up to 15-20 seconds) using monit configuration, either ignoring exit status on some step, or writing some special condition ?
>
> wbr,
> Dmitry.

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

From:	Dmitry Zamaruev
Subject:	Re: Check program problem
Date:	Mon, 19 Nov 2012 18:12:04 +0200