[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [CVS] unix socket support added

From: Jan-Henrik Haukeland
Subject: Re: [CVS] unix socket support added
Date: 05 Aug 2002 19:22:26 +0200
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Civil Service)

Christian Hopp <address@hidden> writes:

> Maybe I just misunderstood you in the first way when we discussed
> Martins options.  Where you said that you are, "Subsidiary I'm +1 for
> this suggestion by Martin".  Because I used that as a base to start
> the patch.

I meant "subsidiary" in the sense that if the vote got through in
spite of my -1 I would give Martins proposal my +1. I'm sorry if I
wasn't clear on this.

> Okay, the impact of the patch to the code would be five lines to p.y
> *

I know, it's a small patch. It was the principle I was making a stand
for. But please, I'm not a total religious zealot, and if valid
requests comes for this I'm absolutely willing to reconsider.


> Before I spoke of the possible NFS problems that could come up, when
> the connection breaks at the time monit accesses it.  You proposed
> to use a "timeout" construction via alarm().  But that won't IHMO
> work. Monit won't be able to evaluate any signal at that time.

Are you sure about this? I cannot actually test it since I do not have
access to a system using NFS. But theoretically I would think that
this would just be some sort of a blocking affair and that alarm()
would shake monit out of it. But of course I may be wrong.

> The only possible way IHMO would be to fork away the actual checker
> and evaluate its exit value. If you wait() longer than the timeout
> value put aside this service until the a wait() is successfully
> answered and warn the specific recipient.  Or do you think that this
> might happen most unlikely???

I don't know. But lets see, this is only relevant for a checksum check
right? Since we assume system pidfiles are on a local disk (I hope)
and if a start/stop program fails because NFS died it will not be a
problem since these child processes are autonomous anyway and not
expected to report back anything.

Assuming this, monit will then only be suspended iff, your assumption
is correct and monit has started reading a file for checksum testing
while NFS died. (If monit was about to open a file for creating MD5
and NFS was down the fopen call will just fail). It takes under a
second to create a MD5 sum for a 2 Mb file. So you should be pretty
unlucky for monit suspension to occur. But of course "Things that will
never happen, always happen". 

What do you think? Is this something we can live with (assuming alarm
won't work). At least, maybe we should document that it's a really bad
idea to save pidfiles in a NFS mounted directory. 

BTW, your suggestion sounds plausible and like the only possible
workaround if alarm() do not work.

Jan-Henrik Haukeland

reply via email to

[Prev in Thread] Current Thread [Next in Thread]