|David Kozinn, K2DBK
|Re: Debugging remote connection failures
|Fri, 19 Sep 2014 23:18:22 -0400
Network tests does fail from time to time. It could be anything from an overworked server to traffic spikes on the network. Usually there are no real problems, just that Monit was not able to connect within 5 sec by default. This happens in real life also, but Browsers for instance, will retry and also open several connections at once so it is not very noticeable.
These alerts, while real, are borderline in the false positive category, because sooner or later with continuous testing there will be a network or server hiccup which happen at the time Monit tries to connect. What you usually want is to ignore these incidents, but instead get an alert if the server really is down for a "significant" period.
This is why the "for x cycles" statement is so useful and highly recommended, especially for network testing. I see that you already is using "for x within Y", but I would just simplify this to something like,
check host example.com with address www.example.com
if failed port 80 protocol http for 3 cycles then alert
if failed port 587 protocol smtp for 3 cycles then alert
How many cycles you should use is a tuning questions and also related to how often Monit runs. At least 2, possible more if Monit runs several times per minute.
Running Monit with -Iv is mostly for debugging and not recommended in production as the output is very verbose and usually not very interesting. Simply run Monit in the background without any parameters is recommended. If an error occurs Monit will write this to its log-file so you wont miss out on the important stuff.
On 18 Sep 2014, at 21:41, David Kozinn, K2DBK <address@hidden> wrote:
> New Monit user here, I'm really just kind of kicking the tires.
> I've got a several things that I'm monitoring on a small server that I have, but I've also get it set up to monitor services on another box. The relevant portion of monitrc looks like this:
> check host example.com with address www.example.com
> if failed port 80 protocol http 3 times within 5 cycles then alert
> if failed port 587 protocol smtp then alert
> The vast majority of the time this works just fine. However, periodically I'll get a failure on one (or very occasionally on both) of these tests, which clear up on the next test cycle (60 seconds later). A few times I've been connected to the machine running monit and as soon as I get the failure, I'll try to manually telnet to the other machine on the appropriate port and it's always worked. I'm trying to figure out why it's failing.
> The problem is that this doesn't happen terribly frequently, so I'm thinking that just running with -Iv might not be practical, since I'd get tons of output. (And to be honest, I'm not quite sure if I'd even see anything there.)
> Can anyone suggest the best way to figure out why these tests are actually failing? Maybe run with verbose mode then tail & filter the output? (Filter for what?)
> To unsubscribe:
|[Prev in Thread]
|[Next in Thread]