On Wed, Feb 02, 2005 at 07:05:50PM +0100, Martin Pala wrote:
Current monit behavior is sufficient, because Monit has both error and
recovery alerts. This means that you should take the alert seriously -
until you will receive 'recovery' alert, you can be sure that the
service is broken.
okay, down to earth with current monit behaviour...
20 seconds timeout, around 8 monitored machines, around 20 alerts a day,
around 1 real alert in a couple of days.
add some human factor to it... It's simple to miss one alert from 21,
when I'm trying to look for real one.
I won't miss an alert when it's screaming out.
I see no problem in this sense with current monit behavior - it will
send you alert and when you will not receive recovery alert in specific
timeframe (which you know), the problem is persistent according to your
rules.
see above. 20 false alerts - 40 void messages.