[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7
HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers
Thu, 8 Dec 2011 15:36:58 +0000
I am posting the following information with permission from HP support, in the
hope that it may be useful for future GRUB developer reference.
Please note that I do not subscribe to the GRUB mailing list, so cc: me
directly if any reply is required.
When using GRUB to chain-load from one device to another device, the HP BIOS
used in currently DL120/DL360 (G7) servers reports "Illegal Opcode" and a red
crashdump screen. This failure did not occur on previous G6 generation servers
of the same models, which used AMI/Phoenix BIOS.
HP support case 4635415916, opened for additional clarification in reference to
HP customer advisory number c02695572
Root cause analysis:
HP level3 engineering identified the root cause as follows:
HP Level-3 engineering have found that the HP BIOS on the DL120 G7 is not
causing the red screen. GRUB loads its own INT13 handler in the interrupt
vector table, so it will now intercept all int13 calls. Some time after it
does that, GRUB does some type of memory copy operation which overrides the
data at the address where Grub stores the INT13 handler code. As a result, on
the next Int13 call in grub, the interrupt handler is no longer there so the
processor just starts to execute whatever data overwrote where the int13
handler code was.
Here is how the red screen happens: When the processor executes an illegal
instruction (like when it tries to execute whatever is in the overwritten int13
handler), the processor causes and interrupt which the BIOS then handles by
printing the red screen with the register dump and the message. So our BIOS
just prints out the red screen, but the cause of the red screen is Grub.
The specific scenario which leads to this is identified as follows:
1) Grub installs its own INT13 handler
2) Near the end of the chain loading process, Grub loads an image of the Linux
kernel into memory which wipes out their Int13 handler.
3) Right before grub transfers control to the kernel to boot, grub makes a call
to a function to turn off the floppy drive.
4) The call to the floppy code then makes an Int13 call to the handler which
has been overwritten by the kernel and thereby results in the red screen.
The problem seems to be that Grub made assumptions about the memory layout in
our system which is not accurate. HP systems that use HP developed BIOSes
instead of outsourced (AMI) BIOSes use more of a memory area called EBDA than a
typical system does. As a result, Grub assumes there's memory that it could
safely use instead of properly calculating an area of safe memory to use.
That's probably why Grub worked on the other systems and fails on G7.
_end quoted text_
Iain Barker - Platform Engineering, Acme Packet.