[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Not understanding 'Program Status Testing'

From: Paul Theodoropoulos
Subject: Re: Not understanding 'Program Status Testing'
Date: Wed, 23 Jul 2014 13:06:24 -0700
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.0

On 7/22/14 2:32 AM, Vincent WATREMEZ wrote:
Hi Paul,

By any chance, does the user running monit have the right privileges to run `burcado status`?
Also, you might debug the status command by displaying its output to STDERR.


Thanks Vincent - I figured out part of it. It wasn't privileges, it was paths - monit maintains a very strict and limited path. When I ran the script myself as root on the command line, it worked fine. But within monit, it actually wanted my test script to be this, with all paths spelled out:

/usr/local/bin/ status |/bin/grep trumgr|/usr/bin/cut -d"|" -f4|/bin/grep ".m" >/dev/null 2>&1
exit $?

Now monit correctly understands when the test fails:

Program 'bucardo.monitor'
  status                            Status failed
  monitoring status                 Monitored
  last started                      Wed, 23 Jul 2014 13:01:36
  last exit value                   0
  data collected                    Wed, 23 Jul 2014 13:01:36

Which is all great - except that it never generates an alert! I've confirmed that my other checks generate alerts - only this one fails to do so.  I have of course tried reversing the status checks, etc - no joy. So I'm still stuck.

2014-07-21 23:44 GMT+02:00 Paul Theodoropoulos <address@hidden>:
I have a daemon which I want to monitor specific status.  I've created the following script called 'bucardo.monitor':

bucardo status |grep mydb|cut -d"|" -f4| grep ".m" >/dev/null 2>&1
exit $?

In short, if the string "(one char)m" exists, I wish to get an alert. When I run the script from the command line, and the string I'm looking for exists, I get the following expected output:

me# bucardo.monitor;echo $?

I created a monit conf file thus:

alert address@hidden with reminder on 5 cycle
alert address@hidden with reminder on 5 cycle
check program bucardo-monitor with path /usr/local/bin/bucardo.monitor
with timeout 3 seconds
if status = 0 then alert

The manual states that the operator should be "==", however the last example under status only uses a single equals sign - and I've tried both, no difference. I've also use just "if status 0 then alert" as suggested in the manual, also no difference.

The problem is that monit always shows a last exit status of "1" - except for a few moments after issuing 'monit reload' to deploy changes to the script:

Program 'bucardo-monitor'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Mon, 21 Jul 2014 14:40:47
  last exit value                   1
  data collected                    Mon, 21 Jul 2014 14:40:47

I've forced the test to be highly sensitive so that it will changed from an exit of 0 to 1 every few minutes, well within my monitoring window - but again, I never get a status other than 1 in monit status, and thus never get an alert.

Am I doing something wrong? Misunderstanding?

Paul Theodoropoulos

To unsubscribe:

Paul Theodoropoulos

reply via email to

[Prev in Thread] Current Thread [Next in Thread]