monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Performing an action only after X failures


From: Martin Pala
Subject: Re: Performing an action only after X failures
Date: Sun, 26 Dec 2004 21:26:52 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041007 Debian/1.7.3-5

Hello,

in the case that you are changeing configuration and restarting the process by hand you should:

1.) either disable given service monitoring in monit before you will restart it by hand - thereafter you can enable monitoring again:
 # monit unmonitor myservice
 # /etc/init.d/myservice restart
 # monit monitor myservice

2.) or restart the service using monit method (no need to unmonitor):
 # monit restart myservice

If you don't do so, you risk race condition (monit can try to start the service during your manual restart). This behavior is common for all process monitors (such as Sun Cluster, etc.) - otherwise the process monitor can't identify whether the service was stopped purposely or that it failed by accident. In this case your requested feature is not good solution.


However i agree that in other cases the possibility to trigger chosen action as soon as the service will reach some error ratio is good. This may allow to divide action rules based on error level. We have currently support just for timeout, using:
 # if 2 restarts within 3 cycles then timeout

In the case that it will be general, it can allow to stack the rules and provide error level dependant actions, such as for example:
 # if <X> <EVENT> within <Y> cycles then <ACTION>

where:
... <X>   = number of event occurences
... <EVENT>       = event type
... <Y>   = number of consequent cycles
... <ACTION>      = given action (alert|restart|unmonitor|exec|...)


I'm +1 to add such feature. What do developers and users think about it?

Martin

Kaspar Landsberg wrote:
Hello,

I'd like monit to perform a given action for a given service only after the service has failed for a given number of checks/cycles/minutes.

Example: Let's suppose I've got some daemon whose configuration I change. But I make a mistake while changing the config and when I try to restart the daemon, I get an error message, the daemon refuses to restart and for a while there's no daemon running. It takes me 2 minutes to fix the error in the conf file and to correctly restart the daemon. But if there was a monit running at the same time with a low checking cycle, then the predefined action for that daemon would be triggered.

I want to avoid such szenario by telling monit to only trigger a given action if the service/daemon in question fails for X cycles/minutes.

Is this already possible? If not, might that feature get added in the near future?

Thanks,
Kaspar

PS: Looked at the archive but didn't find anything related to my question.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]