monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about monitrc file.


From: via . lej
Subject: Re: Question about monitrc file.
Date: Sat, 29 Sep 2007 17:23:49 +0200
User-agent: Opera Mail/9.23 (Win32)

Le Wed, 29 Aug 2007 20:57:37 +0200, Jovan Kostovski <address@hidden> a écrit:

> On 8/28/07, address@hidden <address@hidden> wrote:
>> Ok, and if i want to switch between the 2 nodes ? How can i do ?
>> "When problems appear on active node it will ask the standby to take over the
>> resources, and the failover is made."
>>
>> How to configure monit to do that ? on my config, monit can restart a stopped
>> service, but if the service failed to start, the service is declared as 
>> "timed
>> out" and....nothing, the node is always primary...do you follow me ?
>
> Hi VIanney,
>
> Sorry for the late reply but I've been busy :(
>
> What monit can do is start, stop, restart and if the service fails to
> start several
> times its marked as timedout. That's were heartbeat steps in. When
> some service can not start heartbeat will detect that the service
> can't be started and will ask
> the other node to take over the resources.
>
> Here is a good example of configuring monit + heartbeat:
> http://linux.die.net/man/1/monit
> It will give better preview of the setup that I'm talking about.
>
> You just need to change the monitrc file to look like the following
> (monit+heartbeat+drbd = monitoring the mounted filesystem + mysql +apache):
>
> check process postfix with pidfile /var/spool/postfix/pid/master.pid
>
>   start program = "/etc/init.d/postfix start"
>
>   stop program  = "/etc/init.d/postfix stop"
>
>   mode  active
>
>   group local
>
>
>
> check process heartbeat with pidfile /var/run/heartbeat.pid
>
>   start program = "/etc/init.d/heartbeat start"
>
>   stop  program = "/etc/init.d/heartbeat stop"
>
>   mode  active
>
>   group local
>
>
>
> check device fs with path /dev/drbd0
>
>   start program  = "/etc/ha.d/resource.d/ha-fs start"
>
>   stop program  = "/etc/ha.d/resource.d/ha-fs stop"
>
>   if failed permission 660 then unmonitor
>
>   if failed uid root then unmonitor
>
>   if failed gid root then unmonitor
>
>   if space usage > 80% then alert
>
>   if space usage > 99% then stop
>
>   mode  manual
>
>   group cluster
>
>
>
> check process mysql with pidfile /var/lib/mysql/mysqld.pid
>
>   start program = "/etc/ha.d/resource.d/ha-mysql start"
>
>   stop program  = "/etc/ha.d/resource.d/ha-mysql stop"
>
>   if failed host localhost port 3306 then restart
>
>   if 5 restarts within 5 cycles then timeout
>
>   mode  manual
>
>   group cluster
>
>   depends on fs
>
>
>
> check process apache with pidfile /var/run/httpd2.pid
>
>   start program = "/etc/ha.d/resource.d/ha-apache start"
>
>   stop program  = "/etc/ha.d/resource.d/ha-apache stop"
>
>   if failed host myhost.com port 80
>
>     protocol HTTP request "/monit/token" then restart
>
>   if cpu is greater than 60% for 2 cycles then alert
>
>   if cpu > 80% for 5 cycles then restart
>
>   if children > 250 then restart
>
>   if loadavg(5min) greater than 10 for 8 cycles then stop
>
>   if 5 restarts within 5 cycles then timeout
>
>   mode  manual
>
>   group cluster
>
>   depends on mysql
>
> ----------------------------------------------------------------------
>
> There are two groups:
> local (postfix + hearbeat) and
> cluster (drbd + mysql + apache)
>
> All the services that are montored by heartbeat (the wrapper shell
> scripts) should be added in /etc/ha.d/resource.d. What ever you put
> and start from that location
> will be monitored by heartbeat, so whenever one of the services from
> the cluster group will fail to start for several times, heartbeat will
> take over and will exec failover.
>
> You should specify the hostaname and the IP address in the
> /etc/ha.d/haresources as well.
>
> For more info on configuring heartbeat check http://www.linux-ha.org/
>
> I hope the thins are much more clear now ;)
>
> BR, Jovan
>
>
> --
> To unsubscribe:
> http://lists.nongnu.org/mailman/listinfo/monit-general
>
>

I configured 2 groups and monit declares each failed-service as timedout, but 
it seems that heartbeat doesn't do anything if a service has failed. This is my 
ha.resources:

==File HA.resources==>
Inet-Primaire 10.0.254.254 IPaddr::10.0.254.1 IPaddr::10.0.254.2 drbddisk::data 
Filesystem::/dev/drbd0::/data::ext3 MailTo::address@hidden::InetCluster 
monit-Inet-Primaire
<==

==File monitrc==>
##############################################################################
##Monit control file
###############################################################################
##
## Comments begin with a '#' and extend through the end of the line. Keywords
## are case insensitive. All path's MUST BE FULLY QUALIFIED, starting with '/'.
##
## Bellow is the example of some frequently used statements. For information
## about the control file, a complete list of statements and options please
## have a look in the monit manual.
##
##
###############################################################################
## Global section
###############################################################################
##
## Start monit in background (run as daemon) and check the services at 2-minute
## intervals.
#
set daemon  15
#
#
## Set syslog logging with the 'daemon' facility. If the FACILITY option is
## omited, monit will use 'user' facility by default. You can specify the
## path to the file for monit native logging.
#
set logfile syslog facility log_daemon
#
#
## Set list of mailservers for alert delivery. Multiple servers may be
## specified using comma separator. By default monit uses port 25 - it is
## possible to override it with the PORT option.
#
set mailserver localhost               # primary mailserver
#                backup.bar.baz port 10025,  # backup mailserver on port 10025
#                localhost                   # fallback relay
#
#
## By default monit will drop the event alert, in the case that there is no
## mailserver available. In the case that you want to keep the events for
## later delivery retry, you can use the EVENTQUEUE statement. The base
## directory where undelivered events will be stored is specified by the
## BASEDIR option. You can limit the maximal queue size using the SLOTS
## option (if omited then the queue is limited just by the backend filesystem).
#
set eventqueue
      basedir /var/monit  # set the base directory where events will be stored
      slots 200           # optionaly limit the queue size
#
#
## Monit by default uses the following alert mail format:
##
## --8<--
## From: address@hidden                         # sender
## Subject: monit alert --  $EVENT $SERVICE  # subject
##
## $EVENT Service $SERVICE                   #
##                                           #
##      Date:        $DATE                   #
##      Action:      $ACTION                 #
##      Host:        $HOST                   # body
##      Description: $DESCRIPTION            #
##                                           #
## --8<--
##
## You can override the alert message format or its parts such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded on runtime. For example to override the sender:
#
set mail-format { from: address@hidden }
#
#
## You can set the alert recipients here, which will receive the alert for
## each service. The event alerts may be restricted using the list.
#
set alert address@hidden                       # receive all alerts
# set alert address@hidden only on { timeout }  # receive just service-
#                                                # timeout alert
#
#
## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 3001 and
         SSL ENABLE
         PEMFILE /etc/ssl/CA/private/InetAdministration-key-cert.pem
         allow admin:pladppiuc###
#    use address localhost  # only accept connection from localhost
#     allow localhost        # allow localhost to connect to the server and
#     allow admin:monit      # require user 'admin' with password 'monit'
#
#
###############################################################################
## Services
###############################################################################
##
## Check the general system resources such as load average, cpu and memory
## usage. Each rule specifies the tested resource, the limit and the action
## which will be performed in the case that the test failed.
#
#  check system myhost.mydomain.tld
#    if loadavg (1min) > 4 then alert
#    if loadavg (5min) > 2 then alert
#    if memory usage > 75% then alert
#    if cpu usage (user) > 70% then alert
#    if cpu usage (system) > 30% then alert
#    if cpu usage (wait) > 20% then alert
#
#
## Check a file for existence, checksum, permissions, uid and gid. In addition
## to the recipients in the global section, customized alert will be send to
## the additional recipient. The service may be grouped using the GROUP option.
#
#  check file apache_bin with path /usr/local/apache/bin/httpd
#    if failed checksum and
#       expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor
#    if failed permission 755 then unmonitor
#    if failed uid root then unmonitor
#    if failed gid root then unmonitor
#    alert address@hidden on {
#           checksum, permission, uid, gid, unmonitor
#        } with the mail-format { subject: Alarm! }
#    group server
#
#
## Check that a process is running, responding on the HTTP and HTTPS request,
## check its resource usage such as cpu and memory, number of childrens.
## In the case that the process is not running, monit will restart it by
## default. In the case that the service was restarted very often and the
## problem remains, it is possible to disable the monitoring using the
## TIMEOUT statement. The service depends on another service (apache_bin) which
## is defined in the monit control file as well.
#
#  check process apache with pidfile /usr/local/apache/logs/httpd.pid
#    start program = "/etc/init.d/httpd start"
#    stop program  = "/etc/init.d/httpd stop"
#    if cpu > 60% for 2 cycles then alert
#    if cpu > 80% for 5 cycles then restart
#    if totalmem > 200.0 MB for 5 cycles then restart
#    if children > 250 then restart
#    if loadavg(5min) greater than 10 for 8 cycles then stop
#    if failed host www.tildeslash.com port 80 protocol http
#       and request "/monit/doc/next.php"
#       then restart
#    if failed port 443 type tcpssl protocol http
#       with timeout 15 seconds
#       then restart
#    if 3 restarts within 5 cycles then timeout
#    depends on apache_bin
#    group server
#
#
## Check the device permissions, uid, gid, space and inode usage. Other
## services such as databases may depend on this resource and automatical
## graceful stop may be cascaded to them before the filesystem will become
## full and the data will be lost.
#
#  check device datafs with path /dev/sdb1
#    start program  = "/bin/mount /data"
#    stop program  = "/bin/umount /data"
#    if failed permission 660 then unmonitor
#    if failed uid root then unmonitor
#    if failed gid disk then unmonitor
#    if space usage > 80% for 5 times within 15 cycles then alert
#    if space usage > 99% then stop
#    if inode usage > 30000 then alert
#    if inode usage > 99% then stop
#    group server
#
#
## Check a file's timestamp: when it becomes older then 15 minutes, the
## file is not updated and something is wrong. In the case that the size
## of the file exceeded given limit, perform the script.
#
#  check file database with path /data/mydatabase.db
#    if failed permission 700 then alert
#    if failed uid data then alert
#    if failed gid data then alert
#    if timestamp > 15 minutes then alert
#    if size > 100 MB then exec "/my/cleanup/script"
#
#
## Check the directory permission, uid and gid.  An event is triggered
## if the directory does not belong to the user with the  uid 0 and
## the gid 0.  In the addition the permissions have to match the octal
## description of 755 (see chmod(1)).
#
#  check directory bin with path /bin
#    if failed permission 755 then unmonitor
#    if failed uid 0 then unmonitor
#    if failed gid 0 then unmonitor
#
#
## Check the remote host network services availability and the response
## content.  One of three pings, a successfull connection to a port and
## application level network check is performed.
#
#  check host myserver with address 192.168.1.1
#    if failed icmp type echo count 3 with timeout 3 seconds then alert
#    if failed port 3306 protocol mysql with timeout 15 seconds then alert
#    if failed url
#       http://user:address@hidden:8080/?querystring
#       and content == 'action="j_security_check"'
#       then alert

check process vsftpd_ftpserver with pidfile /var/run/vsftpd/vsftpd.pid
    start program = "/etc/init.d/vsftpd start"
    stop program  = "/etc/init.d/vsftpd stop"
    if failed port 21 protocol ftp then restart
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/etc/init.d/heartbeat stop"
    group Inet-Primaire
    mode manual

check process sshd_remote_access_server with pidfile /var/run/sshd.pid
    start program  "/etc/init.d/ssh start"
    stop program  "/etc/init.d/ssh stop"
    if failed port 2145 protocol ssh then restart
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group local
    mode active

check process mysql_DBserver with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"
    if failed host 127.0.0.1 port 3306 then restart
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group Inet-Primaire
    mode manual

check process apache2_webserver with pidfile /var/run/apache2.pid
    start program = "/etc/init.d/apache2 start"
    stop program  = "/etc/init.d/apache2 stop"
    if failed host 127.0.0.1 port 80 protocol http then restart
      # and request "/monit/token" then restart
    if cpu is greater than 60% for 2 cycles then alert
    if cpu > 80% for 2 cycles then restart
    if totalmem > 500 MB for 5 cycles then restart
    if children > 250 then restart
    if loadavg(5min) greater than 10 for 8 cycles then stop
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group Inet-Primaire
    mode manual

check process postfix_mailserver with pidfile /var/spool/postfix/pid/master.pid
    start program = "/etc/init.d/postfix start"
    stop  program = "/etc/init.d/postfix stop"
    if failed port 25 protocol smtp then restart
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group local
    mode active

check process ntop_network_monitoring with pidfile /var/run/ntop.pid
    start program = "/etc/init.d/ntop start"
    stop  program = "/etc/init.d/ntop stop"
    if failed port 3000 type tcpssl then restart
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group Inet-Primaire
    mode manual

check process freeradius_auth_server with pidfile 
/var/run/freeradius/freeradius.pid
    start program = "/etc/init.d/freeradius start"
    stop  program = "/etc/init.d/freeradius stop"
    if failed port 1812 type udp then restart
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group Inet-Primaire
    mode manual

check process dhcpd_server with pidfile /var/run/dhcpd.pid
    start program = "/etc/init.d/dhcp3-server start"
    stop  program = "/etc/init.d/dhcp3-server stop"
    if failed port 67 type udp then restart
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group Inet-Primaire
    mode manual

check process bind_dns_server with pidfile /var/run/bind/run/named.pid
    start program = "/etc/init.d/bind9 start"
    stop  program = "/etc/init.d/bind9 stop"
    if failed port 53 type tcp then restart
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group local
    mode active

check process upsd_information with pidfile /var/run/nut/upsd.pid
    start program = "/etc/init.d/ups-monitor start"
    stop  program = "/etc/init.d/ups-monitor stop"
    if 2 restarts within 2 cycles then timeout
    #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
    group local
    mode active

check process upsmon_control with pidfile /var/run/nut/upsmon.pid
   start program = "/etc/init.d/ups-monitor start"
   stop  program = "/etc/init.d/ups-monitor stop"
   if 2 restarts within 2 cycles then timeout
   #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
   group local
   mode active

check process ups_driver with pidfile /var/run/nut/usbhid-ups-MGE850VA.pid
   start program = "/etc/init.d/ups-monitor start"
   stop  program = "/etc/init.d/ups-monitor stop"
   if 2 restarts within 2 cycles then timeout
   #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
   group local
   mode active

check process eserver_emule_server with pidfile /var/run/eserver.pid
   start program = "/etc/init.d/eserver start"
   stop  program = "/etc/init.d/eserver stop"
   if failed port 4661 type tcp then restart
   if 2 restarts within 2 cycles then timeout
   #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
   group Inet-Primaire
   mode manual


check process teamspeak_server with pidfile /home/teamspeak/tsserver2.pid
  start program = "/etc/init.d/teamspeak start"
  stop  program = "/etc/init.d/teamspeak stop"
  if failed port 8767 type udp then restart
  if 2 restarts within 2 cycles then timeout
  #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
  group Inet-Primaire
  mode manual




check process heartbeat with pidfile /var/run/heartbeat.pid
   start program = "/etc/init.d/heartbeat start"
   stop  program = "/etc/init.d/heartbeat stop"
   if 2 restarts within 2 cycles then timeout
   #if 2 restarts within 2 cycles then exec "/usr/sbin/monit heartbeat stop"
   group local
   mode active



###############################################################################
##
## It is possible to include the configuration or its parts from other files or
## directories.
#
#  include /etc/monit.d/*
#
#
<====


I don't know why heartbeat doesnt do anything in case of service failure, have 
i missed something?

Thanks,
Vianney




reply via email to

[Prev in Thread] Current Thread [Next in Thread]