[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: slow to take action

From: Jan-Henrik Haukeland
Subject: Re: slow to take action
Date: Sat, 18 Jun 2011 00:16:08 +0200

On Jun 17, 2011, at 8:07 PM, Nick Upson wrote:

> Hi,
> I have a monit configuration where it is monitoring 25 hosts (ping
> test) and several local processes.
> doing anything with monit except a summary takes a long time. It seems
> that the tests are each done sequentially
> a) this means that there is the possibility of one set of tests not
> being complete when the next is due to start as the number of hosts increases
> b) restarting a local process takes too long
> Is there any way I can adjust the configuration to improve the situation?

a) Monit run all test in a single thread and serial. This means that the list 
of tests is run from start to finish. If some tests take a long time to 
complete it just means that Monit will take longer to run through the list of 
tests. What is important is that each and every test is run and Monit will do 
that. What is (usually) less important is if a test run a bit later depending 
on how long previous tests take.

b) Monit forks a new process and this operation take just milliseconds, but 
Monit will wait, if I remember correct, up to one poll cycle to see if the 
process comes up. If your program is slow to start (from Monit's POV that is, 
create the pidfile) then this will delay all the tests since, as mentioned, 
testing is single threaded. So yeah, this model may be improved [1]. But there 
are a few things you can do now, for instance make sure that the program write 
its pidfile as soon as possible. If you cannot modify the program, create a 
wrapper script that write the pidfile first and then do an exec on the program.

You may also fiddle with connection timeout in the configuration, but if set 
too low you may risk false positive alerts which is probably worse. 

1. We are about to release a new version of Monit in a short while which 
implement a new 'check program' which is meant to be used to check the exit 
status of a script or program. This implementation uses another model which 
does not delay other tests and we may use this also when checking processes. 

Best regards
Jan-Henrik Haukeland 
☏: +47 97141255

reply via email to

[Prev in Thread] Current Thread [Next in Thread]