monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New user with several major monit problems


From: Jonathan Wheeler
Subject: Re: New user with several major monit problems
Date: Sat, 10 Sep 2005 23:00:34 +1200
User-agent: Mozilla Thunderbird 1.0.2 (X11/20050423)

Martin Pala wrote:

Hi Martin, thanks for the long reply!

Jonathan Wheeler wrote:

>> 1st and most major problem.
>> "monit -g node1 stop all" kills every process on the system, not just
>> the services in the node1 group, nor even the services described in the
>> monitrc file... no no, it kills EVERYTHING, even the local console
>> session on the machine is kicked out, along with monit itself!
>>
>> This happens with monit running as a standalone daemon, or directly from
>> init. I've tried this on two completely different machines, one debian,
>> and one gentoo, with versions 4.5.1 and 4.5.
>
>
> Monit is not able to stop/kill any process itself. When you call
> 'monit stop all', it just calls stop method of all services (those
> defined by 'stop program' option).
>
> In the case that your system dies, it is cause probably by some of
> your stop scripts (this is not part of monit).

If I hadn't seen it myself, and on two completely different machines, I
wouldn't have believed it myself. The scripts that I'm using are the
heartbeat based scripts rather then the distribution provided
/etc/init.d ones, for the most part at least, I'll have a bit more of an
experiment to see if I can narrow it down a bit.

>
> Here is the example - two groups: 'ldap' and 'sql' contains one
> service each. There is third service which is not part of any group:
>
> --8<--
> set daemon 5
> set logfile /var/log/monit
> set mailserver 127.0.0.1
> set alert address@hidden
> set httpd port 2812 and
> allow 127.0.0.1
> use address 127.0.0.1
>
> check process slapd with pidfile /var/run/slapd/slapd.pid
> start program = "/etc/init.d/slapd start"
> stop program = "/etc/init.d/slapd stop"
> if failed host 127.0.0.1 port 389 protocol ldap3 then restart
> group ldap
>
> check process mysql with pidfile /var/run/mysqld/mysqld.pid
> start program = "/etc/init.d/mysql start"
> stop program = "/etc/init.d/mysql stop"
> if failed host 127.0.0.1 port 3306 protocol mysql then restart
> group sql
>
> check directory bin path /bin
> start program = "/bin/true"
> stop program = "/bin/true"
> if failed permission 755 then alert
> --8<--
>
> 1.) monit is running in daemon mode, all services are working:
>
>
> 2.) ldap group is stopped (as you can see the system keeps running,
> slapd was stopped and unmonitored):
>
>
>
> 3.) ldap group started again:
>
>
>
>
> 4.) even all services can be stopped without affecting the system:
>
>
Heh, looks good! now if only mine worked that way eh?

>> 2nd, and related problem.
>> Groups don't work.
>> monit -g weeelookat me start, or monit -g abcdefg -V, give exactly the
>> same results as monit without -g. monit -g node1 status, is also the
>> same as monit status.
>
>
> The group (-g) option is supported just by following arguments:
>
> start
> stop
> monitor
> unmonitor
> restart
>
> The 'status' as well as 'summary' will realy display the status of all
> services ... currently this is feauture, maybe we should change it ...
>
>
>> Most annoyingly, for my cluster monit -g node1 stop all (as taken
>> directly from your documentation) kills the *entire* server (see
>> problem 1)
>
>
> Cannot be caused by monit - see above.
>
>>
>> 3rd issue.
>> Dependencies, it would appear that monit won't wait in between
>> dependencies. In my case I have it set to start drbd, followed by mount,
>> and finally starting nfs.
>> When I issue an monit start nfs, it attempts to start all 3 services in
>> the space of 1 second, which of course fails horribly as each takes a
>> little while to start up.
>
>
> When using dependency, monit currently doesn't check whether the right
> started service in the chain is running before starting the its
> dependants. It just provides the correct start order, when some of
> your service chain prerequisite link starts slowly, you should modify
> the start scripts of the dependant services to wait for the parent
> service to be running.

Ah.

>
> You can use for example simple fixed 'sleep' in the start script or
> use some method to check whether the service is running - for example
> you can use the 'monit summary'. The following example will return 1
> if slapd is running or 0 otherwise:
>
> monit summary | awk '/slapd/ {exit !($3 != "running")}';
>
> You can then incorporate this test to the start script in the loop
> which will wait for service to start - for example:
>
> --8<--
> while monit summary | awk '/slapd/ {exit !($3 != "running")}'
> do
> sleep 5
> done
> start_service()
> --8<--
>
> (you can also make some give-up counter when the prerequisite service
> remains down for long time, etc.)
>
Thanks for the suggestion, I'll see what I can put together. I realise
that monit is in no way designed to be a replacement for SYS V style
scripts, but I'm sure other people would find a dependancy handling
system that checks to see if a service is started first (perhaps
internally using monit summary information) before attempting to start
services useful.
Is it possible support for something like this to be incorperated into
future versions of monit?

Thanks,
Jonathan

> Martin
>
>
> -- 
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
>
> !DSPAM:432222f0163271598716454!
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]