[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freeipmi-users] request: status info for discrete sensors for monitorin
From: |
Werner Fischer |
Subject: |
[Freeipmi-users] request: status info for discrete sensors for monitoring purposes |
Date: |
Tue, 22 Jun 2010 13:16:43 +0200 |
Hi Al,
ipmimonitoring seems to be very useful for my needs. I gave it a try
with an Intel SR2500 server. I unplugged one power chord from Power
Supply 1 (PS1) and removed the cover of the cassis:
ipmimonitoring reports "Critical" in the fourth column, which is great:
address@hidden:~$ ipmimonitoring -h 192.168.1.211 -u monitor -p
relation -l user | grep "| Critical |"
33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy Lost'
'Non-redundant:Sufficient Resources from Redundant'
36 | Physical Scrty | Physical Security | Critical | N/A | 'General
Chassis Intrusion'
49 | PS1 Status | Power Supply | Critical | N/A | 'Presence detected'
'Power Supply input lost (AC/DC)'
address@hidden:~$
With ipmitool I got an "ok" for these sensors:
address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -P
relation -L user sdr elist
[...]
PS1 AC Current | 78h | ok | 10.1 | 0.12 Amps
PS2 AC Current | 79h | ok | 10.2 | 0.93 Amps
PS1 +12V Current | 7Ah | ok | 10.1 | 0 Amps
PS2 +12V Current | 7Bh | ok | 10.2 | 16 Amps
PS1 +12V Power | 7Ch | ok | 10.1 | 0 Watts
PS2 +12V Power | 7Dh | ok | 10.2 | 192 Watts
P1 Therm Margin | 99h | ok | 3.1 | -49 degrees C
P2 Therm Margin | 9Bh | ok | 3.2 | -54 degrees C
P1 Therm Ctrl % | C0h | ok | 3.1 | 0 unspecified
P2 Therm Ctrl % | C1h | ok | 3.2 | 0 unspecified
Proc 1 Vccp | D0h | ok | 3.1 | 1.23 Volts
Proc 2 Vccp | D1h | ok | 3.2 | 1.23 Volts
Mem Therm Margin | 48h | ns | 3.2 | No Reading
Pwr Unit Stat | 01h | ok | 21.1 |
Power Redundancy | 02h | ok | 21.1 | Redundancy Lost, Non-Redundant:
Sufficient from Redundant
BMC Watchdog | 03h | ok | 7.1 |
Platform Secu V | 04h | ok | 7.1 |
Physical Scrty | 05h | ok | 23.1 | General Chassis intrusion
[...]
Another test with ipmimonitoring, when PS1 is completely removed:
address@hidden:~$ ipmimonitoring -h 192.168.1.211 -u monitor -p
relation -l user | grep "| Critical |"
32 | Pwr Unit Stat | Power Unit | Nominal | N/A | 'OK'
33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy Lost'
'Non-redundant:Sufficient Resources from Redundant'
[...]
49 | PS1 Status | Power Supply | Nominal | N/A | 'OK'
50 | PS2 Status | Power Supply | Nominal | N/A | 'Presence detected'
(Here ipmimonitoring says 'OK' in the last column, VMware says
"Unknown" when a power supply is not installed - see
http://www.wefi.net/shared/sr2500-example-1.png)
My question: how do you distinguish in ipmimonitoring which of the
assertion states are ok ("Nominal") and which are not ("Critical")?
Thanks a lot for your great help,
best regards,
Werner
PS: here is the full output of impimonitoring from my first test:
address@hidden:~$ ipmimonitoring -h 192.168.1.211 -u monitor -p relation -l user
Record_ID | Sensor Name | Sensor Group | Monitoring Status| Sensor Units |
Sensor Reading
1 | BB +1.2V Vtt | Voltage | Nominal | V | 1.197000
2 | BB +1.5V AUX | Voltage | Nominal | V | 1.466400
3 | BB +1.5V | Voltage | Nominal | V | 1.482000
4 | BB +1.8V | Voltage | Nominal | V | 1.785000
5 | BB +3.3V | Voltage | Nominal | V | 3.354000
6 | BB +3.3V STB | Voltage | Nominal | V | 3.354000
7 | BB +1.5V ESB | Voltage | Nominal | V | 1.505400
8 | BB +5V | Voltage | Nominal | V | 5.070000
9 | BB +12V AUX | Voltage | Nominal | V | 11.904000
10 | BB +0.9V | Voltage | Nominal | V | 0.897600
11 | Serverboard Temp | Temperature | Nominal | C | 29.000000
12 | Ctrl Panel Temp | Temperature | Nominal | C | 25.000000
13 | Fan 1 | Fan | Nominal | RPM | 5891.000000
14 | Fan 2 | Fan | Nominal | RPM | 6278.000000
15 | Fan 3 | Fan | Nominal | RPM | 5805.000000
16 | Fan 4 | Fan | Nominal | RPM | 6321.000000
17 | Fan 5 | Fan | Nominal | RPM | 9052.000000
18 | Fan 6 | Fan | Nominal | RPM | 8060.000000
19 | PS1 AC Current | Current | Nominal | A | 0.124000
20 | PS2 AC Current | Current | Nominal | A | 0.992000
21 | PS1 +12V Current | Current | Nominal | A | 0.000000
22 | PS2 +12V Current | Current | Nominal | A | 15.000000
23 | PS1 +12V Power | N/A | Nominal | W | 0.000000
24 | PS2 +12V Power | N/A | Nominal | W | 192.000000
25 | P1 Therm Margin | Temperature | Nominal | C | -49.000000
26 | P2 Therm Margin | Temperature | Nominal | C | -53.000000
27 | P1 Therm Ctrl % | Temperature | Nominal | N/A | 0.000000
28 | P2 Therm Ctrl % | Temperature | Nominal | N/A | 0.000000
29 | Proc 1 Vccp | Voltage | Nominal | V | 1.227600
30 | Proc 2 Vccp | Voltage | Nominal | V | 1.233800
32 | Pwr Unit Stat | Power Unit | Nominal | N/A | 'OK'
33 | Power Redundancy | Power Unit | Critical | N/A | 'Redundancy Lost'
'Non-redundant:Sufficient Resources from Redundant'
34 | BMC Watchdog | Watchdog 2 | Nominal | N/A | 'OK'
35 | Platform Secu V | Platform Security Violation Attempt | Nominal | N/A |
'OK'
36 | Physical Scrty | Physical Security | Critical | N/A | 'General Chassis
Intrusion'
37 | FP Interrupt | Critical Interrupt | Nominal | N/A | 'OK'
38 | Event Log Disabl | Event Logging Disabled | Nominal | N/A | 'OK'
40 | System Event | System Event | Nominal | N/A | 'OK'
41 | BB Vbat | Battery | Nominal | N/A | 'OK'
42 | Fan 1 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
43 | Fan 2 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
44 | Fan 3 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
45 | Fan 4 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
46 | Fan 5 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
47 | Fan 6 Present | Fan | Nominal | N/A | 'Device Inserted/Device Present'
48 | Fan Redundancy | Fan | Nominal | N/A | 'Fully Redundant'
49 | PS1 Status | Power Supply | Critical | N/A | 'Presence detected' 'Power
Supply input lost (AC/DC)'
50 | PS2 Status | Power Supply | Nominal | N/A | 'Presence detected'
51 | ACPI State | System ACPI Power State | Nominal | N/A | 'S0/G0'
52 | Button | Button/Switch | Nominal | N/A | 'OK'
56 | Processor 1 Stat | Processor | Nominal | N/A | 'Processor Presence
detected'
57 | Processor 2 Stat | Processor | Nominal | N/A | 'Processor Presence
detected'
58 | PCIe Link0 | Critical Interrupt | Nominal | N/A | 'OK'
59 | PCIe Link1 | Critical Interrupt | Nominal | N/A | 'OK'
60 | PCIe Link2 | Critical Interrupt | Nominal | N/A | 'OK'
61 | PCIe Link3 | Critical Interrupt | Nominal | N/A | 'OK'
62 | PCIe Link4 | Critical Interrupt | Nominal | N/A | 'OK'
63 | PCIe Link5 | Critical Interrupt | Nominal | N/A | 'OK'
64 | PCIe Link6 | Critical Interrupt | Nominal | N/A | 'OK'
65 | PCIe Link7 | Critical Interrupt | Nominal | N/A | 'OK'
66 | PCIe Link8 | Critical Interrupt | Nominal | N/A | 'OK'
67 | PCIe Link9 | Critical Interrupt | Nominal | N/A | 'OK'
68 | PCIe Link10 | Critical Interrupt | Nominal | N/A | 'OK'
69 | PCIe Link11 | Critical Interrupt | Nominal | N/A | 'OK'
70 | PCIe Link12 | Critical Interrupt | Nominal | N/A | 'OK'
71 | PCIe Link13 | Critical Interrupt | Nominal | N/A | 'OK'
76 | CPU Popul Error | Processor | Nominal | N/A | 'OK'
77 | DIMM 1A | Slot/Connector | Nominal | N/A | 'Slot/Connector Device
installed/attached'
79 | DIMM 1B | Slot/Connector | Nominal | N/A | 'Slot/Connector Device
installed/attached'
81 | DIMM 1C | Slot/Connector | Nominal | N/A | 'Slot/Connector Device
installed/attached'
83 | DIMM 1D | Slot/Connector | Nominal | N/A | 'Slot/Connector Device
installed/attached'
address@hidden:~$
On Mon, 2010-06-21 at 09:32 -0700, Al Chu wrote:
> Hi Werner,
>
> > Does anybody know whether one of the other tools like freeipmi or
> > impiutil has some functionality like this?
>
> In FreeIPMI, there is a tool called ipmimonitoring that I believe does
> what you're asking for (output condensed for readability below).
>
> 18 | Fan1 | Nominal | 14500.00 | RPM | 'OK'
> 19 | Fan2 | Nominal | 14300.00 | RPM | 'OK'
> 20 | Fan3/CPU2 | Nominal | 14300.00 | RPM | 'OK'
> 21 | Fan4/CPU1 | Nominal | 13900.00 | RPM | 'OK'
> 22 | Fan5 | Nominal | 14000.00 | RPM | 'OK'
> 23 | Fan6 | Nominal | 14000.00 | RPM | 'OK'
> 24 | Fan7/CPU3 | Critical | 0.00 | RPM | 'At or Below (<=)
> Lower Non-Recoverable Threshold'
> 25 | Fan8/CPU4 | Critical | 0.00 | RPM | 'At or Below (<=)
> Lower Non-Recoverable Threshold'
> 26 | Fan9 | Critical | 0.00 | RPM | 'At or Below (<=)
> Lower Non-Recoverable Threshold'
> 27 | Power Supply 1 | Nominal | N/A | N/A | 'Presence detected'
> 28 | Power Supply 2 | N/A | N/A | N/A | N/A
>
> So for this example, fans with normal RPM are "Nominal", out of range is
> "Critical", and the power supply that doesn't exist is "N/A". There is
> also a "Warning" output when the situation is appropriate.
>
> I can speak more of it, but it's probably not best on this mailing.
> Feel free to ping me on the FreeIPMI mailing list.
>
> Al
>
> On Mon, 2010-06-21 at 06:08 -0700, Werner Fischer wrote:
> > Hi ipmitool developers,
> >
> > I thought about the problem regarding monitoring discrete IPMI sensors,
> > that Brian reported back in April:
> > http://*www.*mail-archive.com/address@hidden/msg01472.html
> >
> > I did some in-depth testing and looked how the current VMware ESXi 4.0
> > reports different states of discrete IPMI sensors.
> >
> > I tested two example scenarios with an Intel SR2500 server:
> >
> > Test case 1:
> > * Power Supply 2 removed
> > * Chassis cover removed
> > * VMware reports: http://*www.*wefi.net/shared/sr2500-example-1.png
> >
> > Test case 2:
> > * Power Supply 2 present, but power cable removed
> > * Vmware reports: http://*www.*wefi.net/shared/sr2500-example-2.png
> >
> > (Below you find some example ipmitool outputs for these two cases).
> >
> > The current IPMI specification lists possible sensor-specific-offsets
> > for each sensor type in table 42-3, Sensor Type Codes.
> >
> > To me it seems that VMware uses some mapping, which defines which
> > offsets (assertions/deassertions) cause a warning or an alarm,
> > e.g. an offset for the event "General Chassis Intrusion" for a Physical
> > Security sensor (sensor type code 05h) leads to status "Warning".
> >
> > So my request:
> > * introduce some new option for ipmitool (something like "ipmitool
> > get-server-status") where ipmitool uses such kind of mapping,
> > too. We could define which offsets/assertions should cause a
> > warning. In this way an end-user would have an easy way to
> > quickly find out whether or not everything is ok with his
> > hardware...
> >
> > Currently using e.g. "ipmitool sdr elist all" returns "ok" for sensor
> > states like "General Chassis Intrusion" (see below)
> >
> > What do you think?
> > Any other ideas how we could accomplish that?
> > Does anybody know whether one of the other tools like freeipmi or
> > impiutil has some functionality like this?
> >
> > best regards,
> > Werner
> >
> > PS: Here are the outputs of ipmitool for this:
> >
> > Test case 1:
> > address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L
> > user sdr elist all | grep -i "PS"
> > Password:
> > PS1 AC Current | 78h | ok | 10.1 | 0.93 Amps
> > PS2 AC Current | 79h | ns | 10.2 | No Reading
> > PS1 +12V Current | 7Ah | ok | 10.1 | 16 Amps
> > PS2 +12V Current | 7Bh | ns | 10.2 | No Reading
> > PS1 +12V Power | 7Ch | ok | 10.1 | 192 Watts
> > PS2 +12V Power | 7Dh | ns | 10.2 | No Reading
> > PS1 Status | 70h | ok | 10.1 | Presence detected
> > PS2 Status | 71h | ok | 10.2 |
> > address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L
> > user sdr elist all | grep -i "Physical Scrty"
> > Password:
> > Physical Scrty | 05h | ok | 23.1 | General Chassis intrusion
> > address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw
> > 0x04 0x2d 0x70
> > Password:
> > Data length = 1
> > 00 c0 01 00
> > address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw
> > 0x04 0x2d 0x71
> > Password:
> > Data length = 1
> > 00 c0 00 00
> > address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U admin -P
> > relation sdr get "Physical Scrty"
> > Sensor ID : Physical Scrty (0x5)
> > Entity ID : 23.1 (System Chassis)
> > Sensor Type (Discrete): Physical Security
> > States Asserted : Physical Security
> > [General Chassis intrusion]
> > Assertion Events : Physical Security
> > [General Chassis intrusion]
> > Assertions Enabled : Physical Security
> > [General Chassis intrusion]
> > [System unplugged from LAN]
> > Deassertions Enabled : Physical Security
> > [General Chassis intrusion]
> > [System unplugged from LAN]
> >
> > Test case 2:
> > address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L
> > user sdr get "PS2 Status"
> > Password:
> > Sensor ID : PS2 Status (0x71)
> > Entity ID : 10.2 (Power Supply)
> > Sensor Type (Discrete): Power Supply
> > States Asserted : Power Supply
> > [Presence detected]
> > [Power Supply AC lost]
> > Assertion Events : Power Supply
> > [Presence detected]
> > [Power Supply AC lost]
> > Assertions Enabled : Power Supply
> > [Presence detected]
> > [Failure detected]
> > [Predictive failure]
> > [Power Supply AC lost]
> > [Config Error: Vendor Mismatch]
> > [Config Error: Revision Mismatch]
> > [Config Error: Processor Missing]
> > [Config Error]
> > Deassertions Enabled : Power Supply
> > [Presence detected]
> > [Failure detected]
> > [Predictive failure]
> > [Power Supply AC lost]
> > [Config Error: Vendor Mismatch]
> > [Config Error: Revision Mismatch]
> > [Config Error: Processor Missing]
> > [Config Error]
> >
> > address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U monitor -L
> > user sdr elist all | grep -i "PS"
> > Password:
> > PS1 AC Current | 78h | ok | 10.1 | 0.93 Amps
> > PS2 AC Current | 79h | ok | 10.2 | 0.12 Amps
> > PS1 +12V Current | 7Ah | ok | 10.1 | 16 Amps
> > PS2 +12V Current | 7Bh | ok | 10.2 | 0 Amps
> > PS1 +12V Power | 7Ch | ok | 10.1 | 192 Watts
> > PS2 +12V Power | 7Dh | ok | 10.2 | 0 Watts
> > PS1 Status | 70h | ok | 10.1 | Presence detected
> > PS2 Status | 71h | ok | 10.2 | Presence detected, Power
> > Supply AC lost
> > address@hidden:~$ ipmitool -I lan -H 192.168.1.211 -U admin raw
> > 0x04 0x2d 0x71
> > Password:
> > Data length = 1
> > 00 c0 09 00
> > address@hidden:~$
> >
> >
> --
> Albert Chu
> address@hidden
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
>
- [Freeipmi-users] request: status info for discrete sensors for monitoring purposes,
Werner Fischer <=