monit-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] problem cpu usage


From: Martin Pala
Subject: Re: [monit] problem cpu usage
Date: Mon, 18 May 2009 18:57:15 +0200

Hi,

monit was set to monitor cpu usage - it triggers the alert when the watermark is reached, but it doesn't analyze itself which process is responsible for the load.

You can hook easily script which will be triggered by high CPU load and which will collect the informations during the peek.

For example let's create script like this:


/tmp/monit_top.sh:
--8<--
#!/bin/sh

exec 1>/tmp/monit_top
exec 2>>/tmp/monit_top.out

echo $$ > /tmp/monit_top.pid

while true
do
        uptime
        free
        ps --no-headers -A -o "%cpu sz ucomm" | sort -k1nr | head -20
        echo "#############################"
        sleep 5
done
--8<--

chmod 755 /tmp/monit_top.sh


and modify monit configuration like this:

--8<--
  check system TamTam
       if loadavg (1min) > 4 then alert
       if loadavg (5min) > 2 then  then alert
       if memory usage > 75% then alert
if cpu usage (user) > 70% then exec "/tmp/monit_top.sh" else if recovered then exec "/bin/bash -c 'kill `cat /tmp/monit_top.pid` && cat /tmp/monit_top.out | mail -s 'cpu usage alert' address@hidden'"
       if cpu usage (system) > 30% then alert
       if cpu usage (wait) > 20% then alert
--8<--

Basically when the cpu usage goes high, the script which collects the resource usage information and TOP20 processes each 5 seconds is started. When the cpu usage lowered, the script is stopped and output mailed to address@hidden

You can modify the script as you want - collect additional informations, modify the sleep time, etc.


Martin




On May 18, 2009, at 10:06 AM, Pascal Legrand wrote:

Hello,
i've got a problem with monit, i configure it to alert me when cpu usage is too important, and i've got this mail :

Resource limit matched Service Intranet
        Date:        Mon, 18 May 2009 04:13:24 +0200
        Action:      alert
        Host:        tamtam
Description: 'Intranet' cpu user usage of 70.4% matches resource limit [cpu user usage>70.0%]

Resource limit matched Service Intranet
        Date:        Mon, 18 May 2009 04:13:25 +0200
        Action:      alert
        Host:        tamtam
Description: 'Intranet' loadavg(5min) of 2.2 matches resource limit [loadavg(5min)>2.0]

But i cant see in logs what happen.
is it a bug of monit ?
here is some informations about my server

thank you for your help



Debian Lenny
------------------------------------------------------
Paquet : monit
État: installé
Automatiquement installé: non
Version : 1:4.10.1-4
------------------------------------------------------
cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 3
cpu MHz         : 697.898
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse up
bogomips        : 1397.51
clflush size    : 32
power management:
------------------------------------------------------
Configuration de monit

set daemon  60
set logfile syslog facility log_daemon
set mailserver smtp.bla.fr
set mail-format { from: address@hidden }
set alert address@hidden
set httpd port 2812 and
allow admin:test

# System
check system TamTam
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert

# Disk
check device root-hda1  with path /dev/hda1
if space usage > 85% then alert
check device home-hda9  with path /dev/hda9
if space usage > 85% then alert
check device tmp-hda8  with path /dev/hda8
if space usage > 85% then alert
check device usr-hda5  with path /dev/hda5
if space usage > 85% then alert
check device var-hda6  with path /dev/hda6
if space usage > 85% then alert

#Surveillance de ssh
check process sshd with pidfile /var/run/sshd.pid
start program  "/etc/init.d/ssh start"
stop program  "/etc/init.d/ssh stop"
if failed port 22 protocol ssh then alert

#dhcpd
check process dhcpd with pidfile /var/run/dhcpd.pid
start program  "/etc/init.d/dhcp3-server start"
stop program  "/etc/init.d/dhcp3-server stop"
if failed port 67 type udp then alert

#ldap
check process slapd with pidfile /var/run/slapd/slapd.pid
start program = "/etc/init.d/slapd start"
stop program = "/etc/init.d/slapd stop"
if failed port 389 protocol ldap3 then alert

#Smartd
check process smartd with pidfile /var/run/smartd.pid
start program = "/etc/init.d/smartmontools start"
stop program = "/etc/init.d/smartmontools stop"
if changed pid then alert
------------------------------------------------------





--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general





reply via email to

[Prev in Thread] Current Thread [Next in Thread]