monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

monit tries to resolve mail host too early; after that it seems unable t


From: Mike Schmidt
Subject: monit tries to resolve mail host too early; after that it seems unable to get to the network
Date: Sat, 16 Apr 2011 14:40:24 -0400

HI,

I have some 50 systems running monit. I start monit with a 60 second delay. However, after a reboot, monit sometimes starts up and tries to resolve the mail address too early, because it doesn't wait the 60 seconds; and at that point the network may not be ready. After that, I receive no alerts. I get this in the message file: ( I xxxx-ed out the hostname)

Apr 16 03:02:54 actiforme-1 monit[2564]: Starting monit HTTP server at [actiforme-1.vpn.impacts.xxxx.com:2812]
Apr 16 03:02:54 actiforme-1 monit[2564]: monit HTTP server started
Apr 16 03:02:54 actiforme-1 monit[2564]: 'system' Monit started
Apr 16 03:03:14 actiforme-1 monit[2564]: M/Monit: cannot open a connection to http://mon2.xxxx.com:8080/impact/collector -- Success
Apr 16 03:03:14 actiforme-1 monit[2564]: M/Monit: trying next server http://mon1.xxxx.com:8080/impact/collector
Apr 16 03:03:34 actiforme-1 monit[2564]: M/Monit: cannot open a connection to http://mon1.xxxx.com:8080/impact/collector -- Success
Apr 16 03:03:34 actiforme-1 monit[2564]: M/Monit: no server available
Apr 16 03:03:54 actiforme-1 monit[2564]: Cannot open a connection to the mailserver 'mailman.xxxx.com:25' -- Success
Apr 16 03:03:54 actiforme-1 monit[2564]: No mail servers are available
Apr 16 03:03:54 actiforme-1 monit[2564]: Aborting event
Apr 16 03:03:54 actiforme-1 monit[2564]: M/Monit heartbeat started
Apr 16 03:03:54 actiforme-1 monit[2564]: 'date-time' process is not running
Apr 16 03:04:14 actiforme-1 monit[2564]: M/Monit: cannot open a connection to http://mon2.xxxx.com:8080/impact/collector -- Success
Apr 16 03:04:14 actiforme-1 monit[2564]: M/Monit: trying next server http://mon1.xxxx.com:8080/impact/collector
Apr 16 03:04:34 actiforme-1 monit[2564]: M/Monit: cannot open a connection to http://mon1.xxxx.com:8080/impact/collector -- Success
Apr 16 03:04:34 actiforme-1 monit[2564]: M/Monit: no server available
Apr 16 03:04:54 actiforme-1 monit[2564]: Cannot open a connection to the mailserver 'mailman.xxxx.com:25' -- Success
Apr 16 03:04:54 actiforme-1 monit[2564]: No mail servers are available
Apr 16 03:04:54 actiforme-1 monit[2564]: Aborting event
Apr 16 03:04:54 actiforme-1 monit[2564]: 'date-time' trying to restart
Apr 16 03:04:54 actiforme-1 monit[2564]: 'date-time' start: /sbin/service
Apr 16 03:05:15 actiforme-1 monit[2564]: 'Impact3' failed, cannot open a connection to INET[impact3.xxxx.com:443] via TCP
Apr 16 03:05:35 actiforme-1 monit[2564]: 'Impact4' failed, cannot open a connection to INET[impact4.xxxx.com:443] via TCP


..... more of the same

When I logged on the that system this morning, there was no trouble accessing the two https sites

here's the config:

check host Impact3 with address impact3.xxxx.com
      if failed port 443 for 2 times within 3 cycles then alert

check host Impact4 with address impact4.xxxx.com
      if failed port 443 for 2 times within 3 cycles then alert


monit was still trying 8 hours later, still said there was no access to impact3 and 4.

Meanwhile, impact3 and 4 were accessible, as the application that uses them was able to check for updates every 5minutes since just after the reboot.

Anybody have any ideas as to why this happens? In this case, there are no alerts, services are marked down when they are not, ....

--
Mike SCHMIDT
CTO 
Intello Technologies Inc.
address@hidden
Canada: 1-888-404-6261 x320
USA: 1-888-404-6268 x320
www.intello.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]