[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grub mishandles corrupt/missing primary GPT

From: Chris Murphy
Subject: Re: grub mishandles corrupt/missing primary GPT
Date: Wed, 23 Oct 2013 21:07:21 -0600

Thanks for the response:

On Oct 23, 2013, at 7:49 PM, Vladimir 'φ-coder/phcoder' Serbinenko 
<address@hidden> wrote:

> On 24.10.2013 03:38, Chris Murphy wrote:
>> Gist is, starting with a disk with valid PMBR, primary GPT, and backup
>> GPT, if I zero LBA 2, I can no longer boot from the disk. I get a grub
>> rescue prompt.
>> Instead, if I merely corrupt a portion of the first partitiontypeguid to
>> mimic corruption, I can still boot, whereas this primary GPT fails
>> checksums with both gdisk and parted. 
>> This tells me that GRUB isn't checking for the validity of the primary
>> GPT. And GRUB doesn't ever use the backup GPT.
>> Expected behavior is GRUB should check if the MBR is a PMBR (1st and
>> only entry is type 0xEE)
> There are so called "hybrid" disks which we have to treat as GPT

While technically a violation of the UEFI spec, I think this can be worked 
around by considering the disk GPT if the first entry in the MBR is type 0xEE. 
I don't know of a hybrid MBR implementation where an entry other than the first 
is 0xEE. 

But if there is no 0xEE entry at all, this is identical to a formerly GPT disk 
repartitioned as MBR by a utility that doesn't know anything about GPT, and 
thus doesn't erase the stale GPT data - and therefore must be treated as MBR.

>> and if not then consider the disk MBR. If it is
>> PMBR, check validity of the primary GPT header+table, if valid use it.
>> If invalid, check validity of backup GPT header+table, if valid use it.
>> If invalid, fail.
> partmap module is size-critical and CRC32 verification is pretty big.

So perhaps this test is difficult because it's GPT on BIOS, with a limited 
space BIOS boot partition. However, I think on UEFI computers this should still 
work with one valid GPT, rather than not boot at all. There's a lot more space 
for this there.

> There are 3 problems with backup header:
> 1) Backup header would be preserved even when primary is deliberately
> reformatted and if we use it then we'll use it even on disks where we
> should use newly-created MBR

Both primary and backup GPTs are preserved in this case since the primary is in 
LBA 1 and 2, and only LBA 0 is overwritten with the new MBR.

UEFI spec says if the MBR signature of 0xaa55 is intact, and there isn't an 
0xEE entry, and the partition entries are rational (physically on disk and 
don't overlap), then the two GPTs are considered stale and the disk is MBR.

> 2) The disk size isn't always known (loopback over network device,
> ieee1275 disks and CD-ROMs, possibly others)

The primary header contains the location of the backup GPT. If the header is 
sufficiently corrupt, and the backup GPT can't be located, then that's the same 
as an invalid backup GPT, and in that case fail.

My point is we shouldn't fail when there is a valid locatable backup GPT. The 
whole point of having a second GPT is obviated with the current behavior.

> 3) There are some weird scenarios with USB enclosures "forgetting" last
> disk sectors which leads to partition having two different back-headers.
> Consider following scenario:
> One formats with enclosure, then puts disk natively and moves backup
> headers to real end of disk and later modifies partition table. Then
> puts disk in enclosure again and then backup has older table.

I don't think we can work around this kind of hardware vendor sabotage. If it 
looks like a valid GPT, but is actually stale, if it's used and contains 
incorrect information, then boot fails. Better to try than not try at all.

> Do you have ways to handle this?
> Why primary would be corrupted in first place?

It's certainly uncommon. A Google search: corrupt "primary gpt" only turns up 
1900 results. But it is possible.

And this isn't the only mishandling I'm finding, so it's not like GRUB is 
unique. In fact just now by changing only a single byte in the primary GPT 
table (I changed the E to an F in the BIOS boot partition type UUID), the 
kernel suddenly has no idea what disklabel the disk is, and fails to mount 
rootfs. So I need to track that down too, but it seems like it knows the 
primary GPT table is corrupt, but then fails to use the backup GPT for some 

An argument against GRUB doing all of this work: maybe the bootloader should be 
able to blindly trust the primary GPT table with no validity checks? And 
instead rely on (presently non-existent) checks by the underlying OS to fixi 
this problem? Something like an fsck_gpt, seeing as nothing else is in a good 
position to both check and fix such GPTs other than a partition tool.

The UEFI spec says "Software should ask a user for confirmation before 
restoring the primary GPT" and yet it also requires the unspecified software 
fix the primary GPT if corrupt. The spec actually uses the word "must". So per 
usual, the spec has rather lofty demands.

Chris Murphy

reply via email to

[Prev in Thread] Current Thread [Next in Thread]