[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grub mishandles corrupt/missing primary GPT
Re: grub mishandles corrupt/missing primary GPT
Wed, 23 Oct 2013 21:07:21 -0600
Thanks for the response:
On Oct 23, 2013, at 7:49 PM, Vladimir 'φ-coder/phcoder' Serbinenko
> On 24.10.2013 03:38, Chris Murphy wrote:
>> Gist is, starting with a disk with valid PMBR, primary GPT, and backup
>> GPT, if I zero LBA 2, I can no longer boot from the disk. I get a grub
>> rescue prompt.
>> Instead, if I merely corrupt a portion of the first partitiontypeguid to
>> mimic corruption, I can still boot, whereas this primary GPT fails
>> checksums with both gdisk and parted.
>> This tells me that GRUB isn't checking for the validity of the primary
>> GPT. And GRUB doesn't ever use the backup GPT.
>> Expected behavior is GRUB should check if the MBR is a PMBR (1st and
>> only entry is type 0xEE)
> There are so called "hybrid" disks which we have to treat as GPT
While technically a violation of the UEFI spec, I think this can be worked
around by considering the disk GPT if the first entry in the MBR is type 0xEE.
I don't know of a hybrid MBR implementation where an entry other than the first
But if there is no 0xEE entry at all, this is identical to a formerly GPT disk
repartitioned as MBR by a utility that doesn't know anything about GPT, and
thus doesn't erase the stale GPT data - and therefore must be treated as MBR.
>> and if not then consider the disk MBR. If it is
>> PMBR, check validity of the primary GPT header+table, if valid use it.
>> If invalid, check validity of backup GPT header+table, if valid use it.
>> If invalid, fail.
> partmap module is size-critical and CRC32 verification is pretty big.
So perhaps this test is difficult because it's GPT on BIOS, with a limited
space BIOS boot partition. However, I think on UEFI computers this should still
work with one valid GPT, rather than not boot at all. There's a lot more space
for this there.
> There are 3 problems with backup header:
> 1) Backup header would be preserved even when primary is deliberately
> reformatted and if we use it then we'll use it even on disks where we
> should use newly-created MBR
Both primary and backup GPTs are preserved in this case since the primary is in
LBA 1 and 2, and only LBA 0 is overwritten with the new MBR.
UEFI spec says if the MBR signature of 0xaa55 is intact, and there isn't an
0xEE entry, and the partition entries are rational (physically on disk and
don't overlap), then the two GPTs are considered stale and the disk is MBR.
> 2) The disk size isn't always known (loopback over network device,
> ieee1275 disks and CD-ROMs, possibly others)
The primary header contains the location of the backup GPT. If the header is
sufficiently corrupt, and the backup GPT can't be located, then that's the same
as an invalid backup GPT, and in that case fail.
My point is we shouldn't fail when there is a valid locatable backup GPT. The
whole point of having a second GPT is obviated with the current behavior.
> 3) There are some weird scenarios with USB enclosures "forgetting" last
> disk sectors which leads to partition having two different back-headers.
> Consider following scenario:
> One formats with enclosure, then puts disk natively and moves backup
> headers to real end of disk and later modifies partition table. Then
> puts disk in enclosure again and then backup has older table.
I don't think we can work around this kind of hardware vendor sabotage. If it
looks like a valid GPT, but is actually stale, if it's used and contains
incorrect information, then boot fails. Better to try than not try at all.
> Do you have ways to handle this?
> Why primary would be corrupted in first place?
It's certainly uncommon. A Google search: corrupt "primary gpt" only turns up
1900 results. But it is possible.
And this isn't the only mishandling I'm finding, so it's not like GRUB is
unique. In fact just now by changing only a single byte in the primary GPT
table (I changed the E to an F in the BIOS boot partition type UUID), the
kernel suddenly has no idea what disklabel the disk is, and fails to mount
rootfs. So I need to track that down too, but it seems like it knows the
primary GPT table is corrupt, but then fails to use the backup GPT for some
An argument against GRUB doing all of this work: maybe the bootloader should be
able to blindly trust the primary GPT table with no validity checks? And
instead rely on (presently non-existent) checks by the underlying OS to fixi
this problem? Something like an fsck_gpt, seeing as nothing else is in a good
position to both check and fix such GPTs other than a partition tool.
The UEFI spec says "Software should ask a user for confirmation before
restoring the primary GPT" and yet it also requires the unspecified software
fix the primary GPT if corrupt. The spec actually uses the word "must". So per
usual, the spec has rather lofty demands.