[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for a new "script" service type

From: Michel Marti
Subject: Re: Request for a new "script" service type
Date: Wed, 22 Dec 2004 10:14:18 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041007 Debian/1.7.3-5

Martin Pala wrote:
1.) the example which you showed is possible to integrate with monit already using existing file timestamp test as mediator: your script can be run from cron in regular intervals (for example each 5 minutes) and in the case that everything is ok, it could touch some file (for example "/tmp/check_myservice.ok"). This will update its timestamp, which can monit test this way:

There are several problems with this:

1. I don't (yet) have cron on this box (its an arm-based embedded device with limited amount of storage and RAM). I could however install cron to "fix" this. 2. My monit interval is set to 30 seconds but the smallest interval in cron is one minute 3. My embedded device has no battery buffered clock, this means that on bootup, the clock will be set to start of epoch (1970), but later will be synchronized using ntp. This might trigger a unnecessary restart of the service because monit thinks that the file has not been touched within the specified time. 4. Monitoring will be split across two systems (cron/monit). This might not be obvious for users looking at the cron-tab or monit configuration only. Of course, this can be fixed by adding documentation to monitrc/crontab.

> On monit side it should be possible to set at least timeout for method (there > could be some default value, such as 5 seconds). Agreed. And monit might also pass some information to the script using environment variables (e.g. MONIT_SERVICE=<service name>, etc.).

I'm not sure whether it is good to define new 'script' object. I think it could be sufficient to support the generic testing method interface in all existing objects (i.e. 'process', 'device', 'host', 'file', 'directory'). Example syntax:

check device rootfs with path /
  if failed script "/sbin/check_lvm rootvol" with timeout 7s then alert
  if space usage > 90% then alert
I think this would be enough for most cases, but introduces some overhead if trying to monitor some aspects of the system that are not covered by monit at all. E.g. if I want to send an alert if the number of established TCP-connections exceed a certain limit I would have to do something like this:

check file tcp-connections with path /dev/null
   if failed script "/sbin/check_connections --max=1000" with timeout 5s then 

The method will return appropriate event type in the case of failure/passed state and event decription and monit will handle the defined action. The timeout serves as safety for the case that the method will be jammed.
OK, but I suggest that returning the event type and description should be optional. If the script does not return this information, monit should assume the (new) event type "script failed". To determine the general failure/success of the script, monit should IMO look at the scripts exit code.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]