[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [monit] Monitoring processes without a pidfile
From: |
Daniel Clark |
Subject: |
Re: [monit] Monitoring processes without a pidfile |
Date: |
Tue, 22 Jul 2008 16:04:12 -0400 |
On 7/21/08, Martin Pala <address@hidden> wrote:
> Currently the workaround is to create the pidfile from startup script:
> http://www.tildeslash.com/monit/doc/faq.php
The example script didn't work for me, because the service would fork
off another process as part of becoming a daemon. The below ugly shell
code seems to do the trick, but makes me think I should just use
runit/daemontools for this after all...
#!/bin/sh
ME="$(basename $0)"
PIDFILE="/var/run/$ME.pid"
DIR="/usr/local/logpp-0.15"
CMD="$DIR/bin/logpp -d -r 5 -t www.example.com -l debug $DIR/etc/$ME.conf"
PEXISTS=1
if [ -f $PIDFILE ]; then
ps -p $(cat $PIDFILE) 2>&1 > /dev/null
PEXISTS=$?
fi
case $1 in
start)
if [ $PEXISTS -eq 0 ]; then
echo "$ME: already started; exiting with error..."
exit 1
fi
if [ -f $PIDFILE ]; then
echo "$ME: removing stale $PIDFILE"
rm $PIDFILE
fi
$CMD 2>&1 >/tmp/$ME.out &
sleep 1
PGREPCMD="pgrep -n -U root -f '$CMD'"
eval $PGREPCMD > $PIDFILE
RC=$?
if [ $RC -ne 0 ]; then
rm $PIDFILE
echo "$ME: failed to find PID of running command."
echo "$ME: you may need to manually kill a process."
exit 2
fi
;;
stop)
if [ $PEXISTS -ne 0 ]; then
echo "$ME: already stopped; exiting with error..."
exit 3
fi
PID="$(cat $PIDFILE)"
kill -15 $PID 2>/dev/null
RC=$?
if [ $RC -ne 0 ]; then
echo "$ME: Couldn't kill -15 $PID; will try kill -9..."
else
rm $PIDFILE
exit 0
fi
kill -9 $PID 2>/dev/null
if [ $RC -ne 0 ]; then
echo "$ME: Couldn't kill -9 $PID; exiting with error."
exit 4
else
rm $PIDFILE
exit 0
fi
;;
*)
echo "usage: $ME {start|stop}" ;;
esac
> If you need to mangle the service, you should do it via monit (like "monit
> restart <service>" since the service is controlled by monit and if for
> example the service stops by 3rd party process without monit knowing about
> it, monit will start it again.
Good point; although in practice not doing this doesn't seem to be a
problem in some cases, as the timing for things to go wrong would need
to be pretty specific. Unfortunately one of the processes I am using
monit for (sphinx search's "searchd" daemon) pretty much requires its
companion "indexer" program to be run with a switch (--rotate) that
does a kill -HUP on the running "searchd"; but it has been running
that way for a few weeks, and no monit email yet.
> We are also planning to add support for services controled directly by
> monit - monit then won't need the pidfile (it will be parent of such
> services and will know the pid).
Ah, so perhaps if I wait long enough monit will have this
runit/daemontools feature... sweet! :-)
> The regular expression for process name won't be much reliable, since it
> could be easily cheated by any user (starting process with matching name)
> and there also can be cases where multiple matching processes will run
> whereas it can be hard to decide which process is the correct one, etc.
Well, as you see in the shell code above, that can be worked around by
specifying the process is being run as a certain user; also I didn't
mention it, but the machines we are using monit on all do not have
non-sysadmin accounts on them.
I actually like the multiple matching in some cases, as it allows the
creation of scripts that will "fix" the machine if somehow more than
one of the exact same daemon got started by accident (in fact I'll
probably change the stop action of the included script to do that
later).
BTW thanks so much for all the quick & useful replies!