freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] Disable alerting for watchdog timer expiration


From: Al Chu
Subject: Re: [Freeipmi-users] Disable alerting for watchdog timer expiration
Date: Wed, 01 Feb 2012 18:27:14 -0800

Hi Ryan,

Do the options in bmc-watchdog for turning off logging not work?  Or
perhaps you're using the ipmi kernel driver bmc watchdog?

Al

On Wed, 2012-02-01 at 16:31 -0800, Ryan Cox wrote:
> Okay... so I figured it out after looking at the IPMI spec.
> ipmi-raw 0 6 0x24 0x80 0x01 0x00 0x00 0x96 0x00
> 
> The 0x80 is the trick.  The bit that is set is a "don't log" bit.  That 
> takes care of it properly.  The command above uses a 15 second timer, 
> don't log, and hard reset.
> 
> The information about the fields for the Set Watchdog Timer command are 
> documented at 
> ftp://download.intel.com/design/servers/ipmi/IPMIv2_0rev1_0.pdf on page 378.
> 
> Ryan
> 
> On 02/01/2012 03:29 PM, Ryan Cox wrote:
> > Hello all,
> >
> > I would like to change the default behavior for our Dell servers 
> > (mostly blades) to stop alerting at all when the watchdog timer 
> > expires.  Our HP ProLiant BL460c G1 servers don't alert on timer 
> > expiration.  I was hoping to see if there was a difference between the 
> > configs, but the HP servers don't work with ipmi-pef-config ("Unable 
> > to get Number of Alert Policy Entries") and have very few entries in 
> > ipmi-sensors, none of which are related to the watchdog.
> >
> > What I would like to happen when a watchdog timer expires:
> > 1) The system will reboot
> > 2) *No* SNMP trap sent by the server itself
> > 3) *No* SNMP trap sent by the chassis (if the server is a blade)
> > 4) *No* event inserted in the SEL
> > 5) *No* amber lights on the server or chassis
> >
> > What I have accomplished:
> > 1) The system will reboot
> > 2) *No* SNMP trap sent by the server itself (the following worked: 
> > "ipmi-pef-config -c -e Event_Filter_17:Enable_Filter=No")
> >
> > The SEL is populated and an alert sent whether the action is to reboot 
> > the server or do nothing.
> >
> > What I have tried:
> > I set everything in "ipmi-sensors-config -S 44_OS_Watch" to be "No":
> > Section 44_OS_Watch
> >     ## Possible values: Yes/No
> >     
> > Enable_All_Event_Messages                                                   
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Scanning_On_This_Sensor                                              
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Assertion_Event_Timer_Expired                                        
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Assertion_Event_Hard_Reset                                           
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Assertion_Event_Power_Down                                           
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Assertion_Event_Power_Cycle                                          
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Deassertion_Event_Timer_Expired                                      
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Deassertion_Event_Hard_Reset                                         
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Deassertion_Event_Power_Down                                         
> > No
> >     ## Possible values: Yes/No
> >     
> > Enable_Deassertion_Event_Power_Cycle                                        
> > No
> > EndSection
> >
> > This changes the output of ipmi-sensors for that host to:
> > 44 | OS Watch         | Watchdog 2               | N/A        | N/A   
> > | N/A
> >
> > An unmodified host has this:
> > 44 | OS Watch         | Watchdog 2               | N/A        | N/A   
> > | 'OK'
> >
> > After the timer expires, this shows up in the SEL:
> > ID | Date        | Time     | Name             | 
> > Type                     | Event Direction   | Event
> > 1  | Feb-01-2012 | 07:39:18 | SEL              | Event Logging 
> > Disabled   | Assertion Event   | Log Area Reset/Cleared
> > 2  | Feb-01-2012 | 07:39:23 | OS Watch         | Watchdog 
> > 2               | Assertion Event   | Timer expired, status only
> > 3  | Feb-01-2012 | 07:39:23 | OS Watch         | Watchdog 
> > 2               | Assertion Event   | Timer expired, status only
> >
> > If I don't disable the SNMP traps from the server for watchdog timer 
> > expiration, I get a trap for DELL-ASF-MIB::asfTrapASRTimeout.  A blade 
> > chassis will always send a trap stating that the blade changed from 
> > normal to critical.
> >
> > Any other ideas?  Is this something I need to ask Dell about?
> >
> > Thanks,
> > Ryan
> >
> >
> > -- 
> > Ryan Cox
> > Systems Administrator
> > Fulton Supercomputing Lab
> > Brigham Young University
> >
> > http://tech.ryancox.net
> 
> _______________________________________________
> Freeipmi-users mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/freeipmi-users
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




reply via email to

[Prev in Thread] Current Thread [Next in Thread]