[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Question about monitrc file.
From: |
via . lej |
Subject: |
Re: Question about monitrc file. |
Date: |
Sat, 29 Sep 2007 17:23:49 +0200 |
User-agent: |
Opera Mail/9.23 (Win32) |
Le Wed, 29 Aug 2007 20:57:37 +0200, Jovan Kostovski <address@hidden> a écrit:
> On 8/28/07, address@hidden <address@hidden> wrote:
>> Ok, and if i want to switch between the 2 nodes ? How can i do ?
>> "When problems appear on active node it will ask the standby to take over the
>> resources, and the failover is made."
>>
>> How to configure monit to do that ? on my config, monit can restart a stopped
>> service, but if the service failed to start, the service is declared as
>> "timed
>> out" and....nothing, the node is always primary...do you follow me ?
>
> Hi VIanney,
>
> Sorry for the late reply but I've been busy :(
>
> What monit can do is start, stop, restart and if the service fails to
> start several
> times its marked as timedout. That's were heartbeat steps in. When
> some service can not start heartbeat will detect that the service
> can't be started and will ask
> the other node to take over the resources.
>
> Here is a good example of configuring monit + heartbeat:
> http://linux.die.net/man/1/monit
> It will give better preview of the setup that I'm talking about.
>
> You just need to change the monitrc file to look like the following
> (monit+heartbeat+drbd = monitoring the mounted filesystem + mysql +apache):
>
> check process postfix with pidfile /var/spool/postfix/pid/master.pid
>
> start program = "/etc/init.d/postfix start"
>
> stop program = "/etc/init.d/postfix stop"
>
> mode active
>
> group local
>
>
>
> check process heartbeat with pidfile /var/run/heartbeat.pid
>
> start program = "/etc/init.d/heartbeat start"
>
> stop program = "/etc/init.d/heartbeat stop"
>
> mode active
>
> group local
>
>
>
> check device fs with path /dev/drbd0
>
> start program = "/etc/ha.d/resource.d/ha-fs start"
>
> stop program = "/etc/ha.d/resource.d/ha-fs stop"
>
> if failed permission 660 then unmonitor
>
> if failed uid root then unmonitor
>
> if failed gid root then unmonitor
>
> if space usage > 80% then alert
>
> if space usage > 99% then stop
>
> mode manual
>
> group cluster
>
>
>
> check process mysql with pidfile /var/lib/mysql/mysqld.pid
>
> start program = "/etc/ha.d/resource.d/ha-mysql start"
>
> stop program = "/etc/ha.d/resource.d/ha-mysql stop"
>
> if failed host localhost port 3306 then restart
>
> if 5 restarts within 5 cycles then timeout
>
> mode manual
>
> group cluster
>
> depends on fs
>
>
>
> check process apache with pidfile /var/run/httpd2.pid
>
> start program = "/etc/ha.d/resource.d/ha-apache start"
>
> stop program = "/etc/ha.d/resource.d/ha-apache stop"
>
> if failed host myhost.com port 80
>
> protocol HTTP request "/monit/token" then restart
>
> if cpu is greater than 60% for 2 cycles then alert
>
> if cpu > 80% for 5 cycles then restart
>
> if children > 250 then restart
>
> if loadavg(5min) greater than 10 for 8 cycles then stop
>
> if 5 restarts within 5 cycles then timeout
>
> mode manual
>
> group cluster
>
> depends on mysql
>
> ----------------------------------------------------------------------
>
> There are two groups:
> local (postfix + hearbeat) and
> cluster (drbd + mysql + apache)
>
> All the services that are montored by heartbeat (the wrapper shell
> scripts) should be added in /etc/ha.d/resource.d. What ever you put
> and start from that location
> will be monitored by heartbeat, so whenever one of the services from
> the cluster group will fail to start for several times, heartbeat will
> take over and will exec failover.
>
> You should specify the hostaname and the IP address in the
> /etc/ha.d/haresources as well.
>
> For more info on configuring heartbeat check http://www.linux-ha.org/
>
> I hope the thins are much more clear now ;)
>
> BR, Jovan
>
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
>
>
I configured 2 groups and monit declares each failed-service as timedout, but
it seems that heartbeat doesn't do anything if a service has failed. This is my
ha.resources:
==File HA.resources==>
Inet-Primaire 10.0.254.254 IPaddr::10.0.254.1 IPaddr::10.0.254.2 drbddisk::data
Filesystem::/dev/drbd0::/data::ext3 MailTo::address@hidden::InetCluster
monit-Inet-Primaire
<==
==File monitrc==>
##############################################################################
##Monit control file
###############################################################################
##
## Comments begin with a '#' and extend through the end of the line. Keywords
## are case insensitive. All path's MUST BE FULLY QUALIFIED, starting with '/'.
##
## Bellow is the example of some frequently used statements. For information
## about the control file, a complete list of statements and options please
## have a look in the monit manual.
##
##
###############################################################################
## Global section
###############################################################################
##
## Start monit in background (run as daemon) and check the services at 2-minute
## intervals.
#
set daemon 15
#
#
## Set syslog logging with the 'daemon' facility. If the FACILITY option is
## omited, monit will use 'user' facility by default. You can specify the
## path to the file for monit native logging.
#
set logfile syslog facility log_daemon
#
#
## Set list of mailservers for alert delivery. Multiple servers may be
## specified using comma separator. By default monit uses port 25 - it is
## possible to override it with the PORT option.
#
set mailserver localhost # primary mailserver
# backup.bar.baz port 10025, # backup mailserver on port 10025
# localhost # fallback relay
#
#
## By default monit will drop the event alert, in the case that there is no
## mailserver available. In the case that you want to keep the events for
## later delivery retry, you can use the EVENTQUEUE statement. The base
## directory where undelivered events will be stored is specified by the
## BASEDIR option. You can limit the maximal queue size using the SLOTS
## option (if omited then the queue is limited just by the backend filesystem).
#
set eventqueue
basedir /var/monit # set the base directory where events will be stored
slots 200 # optionaly limit the queue size
#
#
## Monit by default uses the following alert mail format:
##
## --8<--
## From: address@hidden # sender
## Subject: monit alert -- $EVENT $SERVICE # subject
##
## $EVENT Service $SERVICE #
## #
## Date: $DATE #
## Action: $ACTION #
## Host: $HOST # body
## Description: $DESCRIPTION #
## #
## --8<--
##
## You can override the alert message format or its parts such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded on runtime. For example to override the sender:
#
set mail-format { from: address@hidden }
#
#
## You can set the alert recipients here, which will receive the alert for
## each service. The event alerts may be restricted using the list.
#
set alert address@hidden # receive all alerts
# set alert address@hidden only on { timeout } # receive just service-
# # timeout alert
#
#
## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 3001 and
SSL ENABLE
PEMFILE /etc/ssl/CA/private/InetAdministration-key-cert.pem
allow admin:pladppiuc###
# use address localhost # only accept connection from localhost
# allow localhost # allow localhost to connect to the server and
# allow admin:monit # require user 'admin' with password 'monit'
#
#
###############################################################################
## Services
###############################################################################
##
## Check the general system resources such as load average, cpu and memory
## usage. Each rule specifies the tested resource, the limit and the action
## which will be performed in the case that the test failed.
#
# check system myhost.mydomain.tld
# if loadavg (1min) > 4 then alert
# if loadavg (5min) > 2 then alert
# if memory usage > 75% then alert
# if cpu usage (user) > 70% then alert
# if cpu usage (system) > 30% then alert
# if cpu usage (wait) > 20% then alert
#
#
## Check a file for existence, checksum, permissions, uid and gid. In addition
## to the recipients in the global section, customized alert will be send to
## the additional recipient. The service may be grouped using the GROUP option.
#
# check file apache_bin with path /usr/local/apache/bin/httpd
# if failed checksum and
# expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor
# if failed permission 755 then unmonitor
# if failed uid root then unmonitor
# if failed gid root then unmonitor
# alert address@hidden on {
# checksum, permission, uid, gid, unmonitor
# } with the mail-format { subject: Alarm! }
# group server
#
#
## Check that a process is running, responding on the HTTP and HTTPS request,
## check its resource usage such as cpu and memory, number of childrens.
## In the case that the process is not running, monit will restart it by
## default. In the case that the service was restarted very often and the
## problem remains, it is possible to disable the monitoring using the
## TIMEOUT statement. The service depends on another service (apache_bin) which
## is defined in the monit control file as well.
#
# check process apache with pidfile /usr/local/apache/logs/httpd.pid
# start program = "/etc/init.d/httpd start"
# stop program = "/etc/init.d/httpd stop"
# if cpu > 60% for 2 cycles then alert
# if cpu > 80% for 5 cycles then restart
# if totalmem > 200.0 MB for 5 cycles then restart
# if children > 250 then restart
# if loadavg(5min) greater than 10 for 8 cycles then stop
# if failed host www.tildeslash.com port 80 protocol http
# and request "/monit/doc/next.php"
# then restart
# if failed port 443 type tcpssl protocol http
# with timeout 15 seconds
# then restart
# if 3 restarts within 5 cycles then timeout
# depends on apache_bin
# group server
#
#
## Check the device permissions, uid, gid, space and inode usage. Other
## services such as databases may depend on this resource and automatical
## graceful stop may be cascaded to them before the filesystem will become
## full and the data will be lost.
#
# check device datafs with path /dev/sdb1
# start program = "/bin/mount /data"
# stop program = "/bin/umount /data"
# if failed permission 660 then unmonitor
# if failed uid root then unmonitor
# if failed gid disk then unmonitor
# if space usage > 80% for 5 times within 15 cycles then alert
# if space usage > 99% then stop
# if inode usage > 30000 then alert
# if inode usage > 99% then stop
# group server
#
#
## Check a file's timestamp: when it becomes older then 15 minutes, the
## file is not updated and something is wrong. In the case that the size
## of the file exceeded given limit, perform the script.
#
# check file database with path /data/mydatabase.db
# if failed permission 700 then alert
# if failed uid data then alert
# if failed gid data then alert
# if timestamp > 15 minutes then alert
# if size > 100 MB then exec "/my/cleanup/script"
#
#
## Check the directory permission, uid and gid. An event is triggered
## if the directory does not belong to the user with the uid 0 and
## the gid 0. In the addition the permissions have to match the octal
## description of 755 (see chmod(1)).
#
# check directory bin with path /bin
# if failed permission 755 then unmonitor
# if failed uid 0 then unmonitor
# if failed gid 0 then unmonitor
#
#
## Check the remote host network services availability and the response
## content. One of three pings, a successfull connection to a port and
## application level network check is performed.
#
# check host myserver with address 192.168.1.1
# if failed icmp type echo count 3 with timeout 3 seconds then alert
# if failed port 3306 protocol mysql with timeout 15 seconds then alert
# if failed url
# http://user:address@hidden:8080/?querystring
# and content == 'action="j_security_check"'
# then alert
check process vsftpd_ftpserver with pidfile /var/run/vsftpd/vsftpd.pid
start program = "/etc/init.d/vsftpd start"
stop program = "/etc/init.d/vsftpd stop"
if failed port 21 protocol ftp then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/etc/init.d/heartbeat stop"
group Inet-Primaire
mode manual
check process sshd_remote_access_server with pidfile /var/run/sshd.pid
start program "/etc/init.d/ssh start"
stop program "/etc/init.d/ssh stop"
if failed port 2145 protocol ssh then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group local
mode active
check process mysql_DBserver with pidfile /var/run/mysqld/mysqld.pid
start program = "/etc/init.d/mysql start"
stop program = "/etc/init.d/mysql stop"
if failed host 127.0.0.1 port 3306 then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group Inet-Primaire
mode manual
check process apache2_webserver with pidfile /var/run/apache2.pid
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if failed host 127.0.0.1 port 80 protocol http then restart
# and request "/monit/token" then restart
if cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 2 cycles then restart
if totalmem > 500 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group Inet-Primaire
mode manual
check process postfix_mailserver with pidfile /var/spool/postfix/pid/master.pid
start program = "/etc/init.d/postfix start"
stop program = "/etc/init.d/postfix stop"
if failed port 25 protocol smtp then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group local
mode active
check process ntop_network_monitoring with pidfile /var/run/ntop.pid
start program = "/etc/init.d/ntop start"
stop program = "/etc/init.d/ntop stop"
if failed port 3000 type tcpssl then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group Inet-Primaire
mode manual
check process freeradius_auth_server with pidfile
/var/run/freeradius/freeradius.pid
start program = "/etc/init.d/freeradius start"
stop program = "/etc/init.d/freeradius stop"
if failed port 1812 type udp then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group Inet-Primaire
mode manual
check process dhcpd_server with pidfile /var/run/dhcpd.pid
start program = "/etc/init.d/dhcp3-server start"
stop program = "/etc/init.d/dhcp3-server stop"
if failed port 67 type udp then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group Inet-Primaire
mode manual
check process bind_dns_server with pidfile /var/run/bind/run/named.pid
start program = "/etc/init.d/bind9 start"
stop program = "/etc/init.d/bind9 stop"
if failed port 53 type tcp then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group local
mode active
check process upsd_information with pidfile /var/run/nut/upsd.pid
start program = "/etc/init.d/ups-monitor start"
stop program = "/etc/init.d/ups-monitor stop"
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group local
mode active
check process upsmon_control with pidfile /var/run/nut/upsmon.pid
start program = "/etc/init.d/ups-monitor start"
stop program = "/etc/init.d/ups-monitor stop"
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group local
mode active
check process ups_driver with pidfile /var/run/nut/usbhid-ups-MGE850VA.pid
start program = "/etc/init.d/ups-monitor start"
stop program = "/etc/init.d/ups-monitor stop"
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group local
mode active
check process eserver_emule_server with pidfile /var/run/eserver.pid
start program = "/etc/init.d/eserver start"
stop program = "/etc/init.d/eserver stop"
if failed port 4661 type tcp then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group Inet-Primaire
mode manual
check process teamspeak_server with pidfile /home/teamspeak/tsserver2.pid
start program = "/etc/init.d/teamspeak start"
stop program = "/etc/init.d/teamspeak stop"
if failed port 8767 type udp then restart
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group Inet-Primaire
mode manual
check process heartbeat with pidfile /var/run/heartbeat.pid
start program = "/etc/init.d/heartbeat start"
stop program = "/etc/init.d/heartbeat stop"
if 2 restarts within 2 cycles then timeout
#if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
group local
mode active
###############################################################################
##
## It is possible to include the configuration or its parts from other files or
## directories.
#
# include /etc/monit.d/*
#
#
<====
I don't know why heartbeat doesnt do anything in case of service failure, have
i missed something?
Thanks,
Vianney
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: Question about monitrc file.,
via . lej <=