[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Check program problem

From: Dmitry Zamaruev
Subject: Re: Check program problem
Date: Mon, 19 Nov 2012 18:12:04 +0200

Nope, this will just increase poll interval for particular service, so service will be restarted twice, but with increased time between restarts :)

Assuming we have some service running with PID=10 (in /tmp/, and script that checks if process mentioned in /tmp/ have less then 100 threads, if not - return 1.

Poll (cycle) #1:
- /tmp/ is run against /tmp/ (contains 10) and returns 1, but this value is not collected by monit until next cycle

Poll #2:
- monit collects status#1, fires event that 'status != 0'
- BEFORE processing event /tmp/ is run again (/tmp/ still contains 10) and return value is 1 again, and again it is postponed till next poll period
- monit process exec action (because status#1 == 1) and restart service (now /tmp/ will contain 20 for example)

Poll #3:
- monit collects status#2, fires event that 'status != 0' - but service was already restarted at #2 and this is obsolete value!
- before processing event /tmp/ is run against /tmp/ (contains 20) and returns 0 (because it is fresh process)
- monit process exec action (because status#2 == 1) and restart service (now /tmp/ will contain 30 for example)

Poll #4:
- monit collects status#3 and see that it is ok

So the problem is that 'check program' result is one step behind than other actions, and at some point in time (poll #3) it uses obsolete information to perform actions.

On Mon, Nov 19, 2012 at 5:43 PM, Jan-Henrik Haukeland <address@hidden> wrote:
I'm not sure I understand the problem, but that does not prevent me from having a suggestion :) I'm wondering if the every statement could help in this situation? As in:

check program with path '/tmp/'
  every 2 cycles
  if status != 0 then exec '/tmp/ restart'

Any luck with that?

On Nov 19, 2012, at 12:12 PM, Dmitry Zamaruev <address@hidden> wrote:

> Hi,
> I'm using 'check program' to monitor thread leak in one of our applications. All is working nice, except that application is always restarted twice. I dig through source code and found that it should be related to how 'check program' is handled.
> Here is my configuration example:
> check program with path '/tmp/'
>   if status != 0 then exec '/tmp/ restart'
> Here is the workflow I'm seeing:
> - Poll period #1:
>   - start /tmp/
> - Poll period #2:
>   - collect exit code from /tmp/
>   - raise event with status = 1
>   - start /tmp/  <<== problem here, script is run against service before restart! so it will return status=1
>   - process event - exec '/tmp/ restart'
> - Poll period #3
>   - collect exit code from /tmp/
>   - raise event with status = 1
>   - start /tmp/  <<== here script is run against fresh service after restart at step #2
>   - process event - exec '/tmp/ restart'
> - Poll period #4
>   - collect exit code from /tmp/
>   - exit status == 0, so all ok now
> If I try to use different condition, for example 'status == 1 for 2 cycles' - this event chain will be just longer, i.e. after two failures it will restart application, but because next poll cycle is also "failure" - three failed cycles, monit will still successfully match against 'status == 1 for 2 cycles'.
> Is there any way to workaround double restart (time for restart is up to 15-20 seconds) using monit configuration, either ignoring exit status on some step,  or writing some special condition ?
> wbr,
> Dmitry.

To unsubscribe:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]