monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Process being restarted by monit validate even if it is running


From: Mehul Ved
Subject: RE: Process being restarted by monit validate even if it is running
Date: Mon, 10 Mar 2014 07:30:15 +0000

I stopped monit daemon and ran monit -vI and that didn't restart the services. 
Even after restarting monit daemon, this time validate went fine. 
I'm not sure what changed or how to reproduce the issue. I'm setting up monit 
on another machine and will check what happens there.
________________________________________
From: address@hidden <address@hidden> on behalf of Martin Pala <address@hidden>
Sent: Friday, March 07, 2014 5:01 PM
To: This is the general mailing list for monit
Subject: Re: Process being restarted by monit validate even if it is running

You can use pattern based check for processes which don't have pidfile (no need 
to create wrapper which will write the pidfile):

        check process v2api matching "/usr/local/bin/node 
/usr/local/share/nodeapis/server.js"
                ...

The "as uid root" should be removed from start+stop script (as you described, 
Monit is running as root already).

For debugging, stop Monit and and start it again in foreground ("-I" = capital 
"i") with -v option:

        monit -vI

Monit will log details about each test to console.

Regards,
Martin



On 07 Mar 2014, at 12:21, Mehul Ved <address@hidden> wrote:

> # ls -ld /var/run
> lrwxrwxrwx 1 root root 4 Jul  5  2013 /var/run -> /run
>
> # ls -ld /var/run/node/
> drwxr-xr-x 2 root root 120 Mar  7 08:52 /var/run/node/
>
> # ls -l /var/run/node/v2.pid
> -rw-r--r-- 1 root root 4 Mar  7 08:52 /var/run/node/v2.pid
>
> # lsof -i :2812
> COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
> monit   14382 root    6u  IPv4 70768563      0t0  TCP localhost:2812 (LISTEN)
>
> I am running monit as root and also monit validate is run from root login.
>
> I had added "as uid root" specifically because I was having problem reading 
> pidfile without that. Probably redundant now.
> Another thing I should have made clear. This is only happening only for 
> processes that I have written a start/stop script. My script gets the pid of 
> the process and echo's it into the given file. To the best of my knowledge 
> that's the correct thing to do with a pidfile. And considering that monit is 
> getting the pid correctly, I believe that part is fine. It even manages to 
> stop the program correctly, which would have failed if the program wasn't 
> running with correct pid.
>
> I am failing to figure out as to why monit doesn't see that process running. 
> Any suggested debugging steps I can follow?
>
> ________________________________________
> From: address@hidden <address@hidden> on behalf of Martin Pala 
> <address@hidden>
> Sent: Friday, March 07, 2014 4:26 PM
> To: This is the general mailing list for monit
> Subject: Re: Process being restarted by monit validate even if it is running
>
> Hi,
>
> is Monit running as root or as different user?
>
> If it is running as root, then the "as uid root" in stop/start programs is 
> not necessary:
>
>        start program = "/usr/local/bin/nodeinit v2 start" as uid root
>
> If it is running as different user (which may be the reason for adding "as 
> uid root", but that most probably won't work, as the user won't have 
> permission to switch to root unless the binary is setuid or sudo is used), 
> then it is possible that Monit cannot read the pidfile, please check the 
> permissions of the whole path to the pidfile and the pidfile itself:
>
>        ls -ld /var/run
>        ls -ld /var/run/node
>        ls -l /var/run/node/v2.pid
>
> You can run Monit in debug mode to get more details about the test progress:
>
>        monit -vI
>
>
> Regards,
> Martin
>
>
> On 07 Mar 2014, at 10:03, Mehul Ved <address@hidden> wrote:
>
>> Hi,
>>  I have a process which I am monitoring with following rules
>>
>> check process v2api with pidfile /var/run/node/v2.pid
>>  start program = "/usr/local/bin/nodeinit v2 start"
>>    as uid root
>>  stop program = "/usr/local/bin/nodeinit v2 stop"
>>    as uid root
>>  if failed host 127.0.0.1 port 10400 protocol http
>>    request /api/v2/ping
>>    with timeout 10 seconds
>>    then restart
>>  if 5 restarts within 10 cycles then alert
>>
>> Before running monit validate, I checked the pidfile of the process
>>
>> # grep [0-9]* /var/run/node/
>> /var/run/node/v2.pid:31566
>>
>> # cat /var/run/node/v2.pid
>> 31566
>>
>> I also verified with ps on the process id
>>
>> # ps aux | grep 31566
>> root     31566  0.3  1.7 605684 29820 ?        Sl   05:24   0:01 
>> /usr/local/bin/node /usr/local/share/nodeapis/server.js
>>
>>
>> Now when I run
>>
>> # monit validate --verbose
>> 'v2api' Error testing process id [31566] -- No such process
>> 'v2api' process is not running
>> Does not exist notification is sent to address@hidden
>> 'v2api' trying to restart
>> 'v2api' stop: /usr/local/bin/nodeinit
>> 'v2api' Error testing process id [31566] -- No such process
>> /usr/local/bin/nodeinit: line 80: kill: (31566) - No such process
>> Killed v2 process with pid 31566
>> monit: pidfile '/var/run/node/v2.pid' does not exist
>> monit: pidfile '/var/run/node/v2.pid' does not exist
>> 'v2api' start: /usr/local/bin/nodeinit
>> monit: pidfile '/var/run/node/v2.pid' does not exist
>> v2 has started with PID: 1150
>>
>> It complains that the process is not running.
>>
>> I am using the development version of monit that Martin linked to a couple 
>> of days back, with websocket support.
>>
>> # monit --version
>> This is Monit version 5.8
>> Copyright (C) 2001-2014 Tildeslash Ltd. All Rights Reserved.
>>
>> ‚Äč
>>
>>
>>
>>
>>
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general


--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

reply via email to

[Prev in Thread] Current Thread [Next in Thread]