[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grub mishandles corrupt/missing primary GPT
From: |
Lennart Sorensen |
Subject: |
Re: grub mishandles corrupt/missing primary GPT |
Date: |
Thu, 24 Oct 2013 09:39:02 -0400 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
On Wed, Oct 23, 2013 at 09:07:21PM -0600, Chris Murphy wrote:
> While technically a violation of the UEFI spec, I think this can be worked
> around by considering the disk GPT if the first entry in the MBR is type
> 0xEE. I don't know of a hybrid MBR implementation where an entry other than
> the first is 0xEE.
Well everyone other than Microsoft seems to understand how useful support
for hybrid setups can be and hence support them.
> But if there is no 0xEE entry at all, this is identical to a formerly GPT
> disk repartitioned as MBR by a utility that doesn't know anything about GPT,
> and thus doesn't erase the stale GPT data - and therefore must be treated as
> MBR.
That is true. That does not mean there must ONLY be a 0xEE entry.
> So perhaps this test is difficult because it's GPT on BIOS, with a limited
> space BIOS boot partition. However, I think on UEFI computers this should
> still work with one valid GPT, rather than not boot at all. There's a lot
> more space for this there.
Certainly if using the BIOS boot partition, there really isn't much of
a space excuse anymore, unless you run into limitations on how much ram
you can use in early boot.
> Both primary and backup GPTs are preserved in this case since the primary is
> in LBA 1 and 2, and only LBA 0 is overwritten with the new MBR.
>
> UEFI spec says if the MBR signature of 0xaa55 is intact, and there isn't an
> 0xEE entry, and the partition entries are rational (physically on disk and
> don't overlap), then the two GPTs are considered stale and the disk is MBR.
>
> The primary header contains the location of the backup GPT. If the header is
> sufficiently corrupt, and the backup GPT can't be located, then that's the
> same as an invalid backup GPT, and in that case fail.
>
> My point is we shouldn't fail when there is a valid locatable backup GPT. The
> whole point of having a second GPT is obviated with the current behavior.
Sometimes backups are designed in and never used. I don't recall ever
seeing any indication Microsoft ever used the second copy of the FAT
for anything other than filesystem repair tools.
> I don't think we can work around this kind of hardware vendor sabotage. If it
> looks like a valid GPT, but is actually stale, if it's used and contains
> incorrect information, then boot fails. Better to try than not try at all.
>
> It's certainly uncommon. A Google search: corrupt "primary gpt" only turns up
> 1900 results. But it is possible.
>
> And this isn't the only mishandling I'm finding, so it's not like GRUB is
> unique. In fact just now by changing only a single byte in the primary GPT
> table (I changed the E to an F in the BIOS boot partition type UUID), the
> kernel suddenly has no idea what disklabel the disk is, and fails to mount
> rootfs. So I need to track that down too, but it seems like it knows the
> primary GPT table is corrupt, but then fails to use the backup GPT for some
> reason.
>
> An argument against GRUB doing all of this work: maybe the bootloader should
> be able to blindly trust the primary GPT table with no validity checks? And
> instead rely on (presently non-existent) checks by the underlying OS to fixi
> this problem? Something like an fsck_gpt, seeing as nothing else is in a good
> position to both check and fix such GPTs other than a partition tool.
Perhaps. Certainly simpler.
I do wonder how Windows handles booting with a corrupt primary GPT.
Would you happen to know? (A quick google search didn't find an answer
to the question unfortunately).
> The UEFI spec says "Software should ask a user for confirmation before
> restoring the primary GPT" and yet it also requires the unspecified software
> fix the primary GPT if corrupt. The spec actually uses the word "must". So
> per usual, the spec has rather lofty demands.
So it must fix it after asking the user for confirmation?
--
Len Sorensen