freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] Decoding ram errors on supermicro


From: Albert Chu
Subject: Re: [Freeipmi-users] Decoding ram errors on supermicro
Date: Tue, 04 Dec 2018 10:40:22 -0800

On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote:
> Sure. It seems there's a similar ticket
> already: https://github.com/chu11/freeipmi-mirror/issues/19

Ahh, if you could, update it with info from ipmitool / ipmiutil.  I was
reluctant to add support based on reverse engineering.  But if other
tools have "official" interpretations from Supermicro, I'm more
confident in the addition.

> Yep, that's the code. ipmitool and a few others decode it too.
> 
> 
> We have a *lot* of Supermicros so I can help with testing if needed -
> but we don't get that much CRC errors though :)

The one thing I'll need is product ID numbers (you can get from bmc-
info) and the name of the product.  This goes into the documentation
and some of the code.

Thanks,

Al

> So I guess we'd have to wait till one pops up. But I hope the 'ver 2'
> method from ipmiutil works fine.
> We used ipmitool in our monitoring before and it was accurate but
> slow, that's why I rewrote it all to use freeipmi.
> 
> 
> Thanks!
> 
> 
> Best,
> Tom Hetmer
> 
> 
> CDN77 Operations
> address@hidden / +44 (0) 20 3514 2399 / www.cdn77.com
> 
> ----- Původní zpráva ----- 
> > Odesilatel: "Albert Chu" <address@hidden> 
> > Příjemce: "Tom Hetmer" <address@hidden>, address@hidden
> > .org 
> > Datum: 12/03/18 21:06 
> > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro 
> > 
> > Hi Tom,
> > 
> > Thanks for the pointer to ipmiutil's code.  I assume you found this
> > comment:
> > 
> > ---
> >       /* ver 2 method: 2A 80 = P1_DIMMB1
> > */                                                                 
> >                            
> >           /* SuperMicro
> > says:                                                              
> >                                             
> >            *  pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * 3,
> > (='B')                                                             
> >     
> >            *  dimm: %c (data2 & 0xf) +
> > 0x27,                                                              
> >                              
> >            *  cpu:  %x (data3 & 0x03) +
> > 1);                                                                
> >                             
> >            */                       
> > ---
> > 
> > I can definitely add it to my todo list.
> > 
> > Would you mind writing up an issue on github here?
> > 
> > https://github.com/chu11/freeipmi-mirror
> > 
> > Al
> > 
> > On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote:
> > > Hi, 
> > > 
> > > it'd be good if freeipmi supported decoding the supermicro ECC
> > > errors.
> > > 
> > > 
> > > Manufacturer: Supermicro
> > > Product Name: X10DRH LN4
> > > eg.
> > > freeipmi
> > > 1,Dec-01-2018,06:37:53,Sensor #0,Memory,Critical,Uncorrectable
> > > memory
> > > error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 code = 81h
> > > 
> > > 
> > > web interface
> > > 1 | 12/01/2018 | 06:37:53 | Memory | Uncorrectable ECC
> > > (@DIMMG1(CPU2)) | Asserted
> > > 
> > > 
> > > something like this worked for me (stolen from ipmiutil)
> > > 
> > > 
> > > $cpu = ($data3 & 0x03) + 1;
> > > 
> > > 
> > > $NPAIRS = 26;
> > > $rgpairs = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> > > 
> > > 
> > > $bdata = "0x".$data2.$data3;
> > > $bdata = hexdec($bdata);
> > > $pair = (($bdata & 0xF0) >> 4) - 1;
> > > 
> > > 
> > > if ($pair < 0) $pair = 0;
> > > if ($pair > $NPAIRS) $pair = $NPAIRS - 1;
> > > 
> > > 
> > > $pair = $rgpairs[$pair - 1];
> > > 
> > > 
> > > $dimm = $bdata & 0x0F;
> > > 
> > > 
> > > $dimm may be incorrect as the original code decrements 9, but on
> > > that
> > > board it was wrong so i changed it to get the right result -
> > > we'll
> > > see if it keeps getting the right values.
> > > 
> > > Best,
> > > Tom Hetmer
> > > 
> > > 
> > > CDN77 Operations
> > > address@hidden / +44 (0) 20 3514 2399 / www.cdn77.com
> > > 
> > > _______________________________________________
> > > Freeipmi-users mailing list
> > > address@hidden
> > > https://lists.gnu.org/mailman/listinfo/freeipmi-users
> > 
> > -- 
> > Albert Chu
> > address@hidden
> > Computer Scientist
> > High Performance Systems Division
> > Lawrence Livermore National Laboratory
> 
> _______________________________________________
> Freeipmi-users mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/freeipmi-users
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




reply via email to

[Prev in Thread] Current Thread [Next in Thread]