[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Monit shows "statistic error"

From: Lutz Mader
Subject: Re: Monit shows "statistic error"
Date: Sat, 21 Nov 2020 09:40:48 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

Hello Ani,
I checked some of my logs and find a similar problem all the time the
workload is very high (on a AIX system).

[MESZ May  8 05:29:14] error    : 'D100SPUABC00' mem usage of 95.5%
matches resource limit [mem usage > 95.0%]
[MESZ May  8 05:31:14] error    : 'Manager' failed to get process data

>> I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions
>> I see that following error in the log:
>> 2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot
>> read /proc/3560/stat

As long as this is a workload problem you can configure Monit to delay a
restart. With a additinal "not exist" rule

  if not exist for 5 cycles then start

in the "check process" service, Monit will start/restart the service
after 5 checks only. If Monit can not get the process data only once,
nothing will happen (I append a sample).

A suggestion only,

A sample of one of the used service definitions:

check process Serv_server1 with pidfile
  start program "/usr/local/etc/monit/scripts/ start" with
timeout 180 seconds
  stop program "/usr/local/etc/monit/scripts/ stop" with
timeout 120 seconds
  restart program "/usr/local/etc/monit/scripts/ restart" with
timeout 300 seconds
#  if failed host hostname.local port 8901 then alert
#  if failed host hostname.local port 9901 then alert
  if not exist for 5 cycles then start
  if 5 restarts within 50 cycles then unmonitor

The "not exist" rule delays the start to five checks and the "restart"
rule prevent endless recovery.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]