Re: [monit] problem cpu usage

monit-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [monit] problem cpu usage

From:	Martin Pala
Subject:	Re: [monit] problem cpu usage
Date:	Mon, 18 May 2009 18:57:15 +0200

Hi,

monit was set to monitor cpu usage - it triggers the alert when thewatermark is reached, but it doesn't analyze itself which process isresponsible for the load.

You can hook easily script which will be triggered by high CPU loadand which will collect the informations during the peek.


For example let's create script like this:


/tmp/monit_top.sh:
--8<--
#!/bin/sh

exec 1>/tmp/monit_top
exec 2>>/tmp/monit_top.out

echo $$ > /tmp/monit_top.pid

while true
do
        uptime
        free
        ps --no-headers -A -o "%cpu sz ucomm" | sort -k1nr | head -20
        echo "#############################"
        sleep 5
done
--8<--

chmod 755 /tmp/monit_top.sh


and modify monit configuration like this:

--8<--
  check system TamTam
       if loadavg (1min) > 4 then alert
       if loadavg (5min) > 2 then  then alert
       if memory usage > 75% then alert

if cpu usage (user) > 70% then exec "/tmp/monit_top.sh" elseif recovered then exec "/bin/bash -c 'kill `cat /tmp/monit_top.pid` &&cat /tmp/monit_top.out | mail -s 'cpu usage alert' address@hidden'"

       if cpu usage (system) > 30% then alert
       if cpu usage (wait) > 20% then alert
--8<--

Basically when the cpu usage goes high, the script which collects theresource usage information and TOP20 processes each 5 seconds isstarted. When the cpu usage lowered, the script is stopped and outputmailed to address@hidden

You can modify the script as you want - collect additionalinformations, modify the sleep time, etc.



Martin




On May 18, 2009, at 10:06 AM, Pascal Legrand wrote:

Hello,

i've got a problem with monit, i configure it to alert me when cpuusage is too important, and i've got this mail :


Resource limit matched Service Intranet
        Date:        Mon, 18 May 2009 04:13:24 +0200
        Action:      alert
        Host:        tamtam

Description: 'Intranet' cpu user usage of 70.4% matches resourcelimit [cpu user usage>70.0%]


Resource limit matched Service Intranet
        Date:        Mon, 18 May 2009 04:13:25 +0200
        Action:      alert
        Host:        tamtam

Description: 'Intranet' loadavg(5min) of 2.2 matches resource limit[loadavg(5min)>2.0]


But i cant see in logs what happen.
is it a bug of monit ?
here is some informations about my server

thank you for your help



Debian Lenny
------------------------------------------------------
Paquet : monit
État: installé
Automatiquement installé: non
Version : 1:4.10.1-4
------------------------------------------------------
cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 3
cpu MHz         : 697.898
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes

flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pgemca cmov pat pse36 mmx fxsr sse up

bogomips        : 1397.51
clflush size    : 32
power management:
------------------------------------------------------
Configuration de monit

set daemon  60
set logfile syslog facility log_daemon
set mailserver smtp.bla.fr
set mail-format { from: address@hidden }
set alert address@hidden
set httpd port 2812 and
allow admin:test

# System
check system TamTam
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert

# Disk
check device root-hda1  with path /dev/hda1
if space usage > 85% then alert
check device home-hda9  with path /dev/hda9
if space usage > 85% then alert
check device tmp-hda8  with path /dev/hda8
if space usage > 85% then alert
check device usr-hda5  with path /dev/hda5
if space usage > 85% then alert
check device var-hda6  with path /dev/hda6
if space usage > 85% then alert

#Surveillance de ssh
check process sshd with pidfile /var/run/sshd.pid
start program  "/etc/init.d/ssh start"
stop program  "/etc/init.d/ssh stop"
if failed port 22 protocol ssh then alert

#dhcpd
check process dhcpd with pidfile /var/run/dhcpd.pid
start program  "/etc/init.d/dhcp3-server start"
stop program  "/etc/init.d/dhcp3-server stop"
if failed port 67 type udp then alert

#ldap
check process slapd with pidfile /var/run/slapd/slapd.pid
start program = "/etc/init.d/slapd start"
stop program = "/etc/init.d/slapd stop"
if failed port 389 protocol ldap3 then alert

#Smartd
check process smartd with pidfile /var/run/smartd.pid
start program = "/etc/init.d/smartmontools start"
stop program = "/etc/init.d/smartmontools stop"
if changed pid then alert
------------------------------------------------------





--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

[Prev in Thread]

Current Thread

[Next in Thread]

[monit] problem cpu usage, Pascal Legrand, 2009/05/18
- Re: [monit] problem cpu usage, Aleksander Kamenik, 2009/05/18
  - Re: [monit] problem cpu usage, Pascal Legrand, 2009/05/18
    - Re: [monit] problem cpu usage, Aleksander Kamenik, 2009/05/18
    - Re: [monit] problem cpu usage, Eric Pailleau, 2009/05/18
- Re: [monit] problem cpu usage, Martin Pala <=

Prev by Date: [monit] How to test that Internet is really up ?
Next by Date: Re: [monit] How to test that Internet is really up ?
Previous by thread: Re: [monit] problem cpu usage
Next by thread: [monit] How to test that Internet is really up ?
Index(es):
- Date
- Thread