[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_
From: |
Michael S. Tsirkin |
Subject: |
Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data |
Date: |
Tue, 31 Jan 2023 16:27:29 -0500 |
On Tue, Jan 31, 2023 at 08:39:08PM +0100, Jason A. Donenfeld wrote:
> On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote:
> > From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> >
> > The setup_data links are appended to the compressed kernel image. Since
> > the kernel image is typically loaded at 0x100000, setup_data lives at
> > `0x100000 + compressed_size`, which does not get relocated during the
> > kernel's boot process.
> >
> > The kernel typically decompresses the image starting at address
> > 0x1000000 (note: there's one more zero there than the compressed image
> > above). This usually is fine for most kernels.
> >
> > However, if the compressed image is actually quite large, then
> > setup_data will live at a `0x100000 + compressed_size` that extends into
> > the decompressed zone at 0x1000000. In other words, if compressed_size
> > is larger than `0x1000000 - 0x100000`, then the decompression step will
> > clobber setup_data, resulting in crashes.
> >
> > Visually, what happens now is that QEMU appends setup_data to the kernel
> > image:
> >
> > kernel image setup_data
> > |--------------------------||----------------|
> > 0x100000 0x100000+l1 0x100000+l1+l2
> >
> > The problem is that this decompresses to 0x1000000 (one more zero). So
> > if l1 is > (0x1000000-0x100000), then this winds up looking like:
> >
> > kernel image setup_data
> > |--------------------------||----------------|
> > 0x100000 0x100000+l1 0x100000+l1+l2
> >
> > d e c o m p r e s s e d k e r n e l
> >
> > |-------------------------------------------------------------|
> > 0x1000000
> > 0x1000000+l3
> >
> > The decompressed kernel seemingly overwriting the compressed kernel
> > image isn't a problem, because that gets relocated to a higher address
> > early on in the boot process, at the end of startup_64. setup_data,
> > however, stays in the same place, since those links are self referential
> > and nothing fixes them up. So the decompressed kernel clobbers it.
> >
> > Fix this by appending setup_data to the cmdline blob rather than the
> > kernel image blob, which remains at a lower address that won't get
> > clobbered.
> >
> > This could have been done by overwriting the initrd blob instead, but
> > that poses big difficulties, such as no longer being able to use memory
> > mapped files for initrd, hurting performance, and, more importantly, the
> > initrd address calculation is hard coded in qboot, and it always grows
> > down rather than up, which means lots of brittle semantics would have to
> > be changed around, incurring more complexity. In contrast, using cmdline
> > is simple and doesn't interfere with anything.
> >
> > The microvm machine has a gross hack where it fiddles with fw_cfg data
> > after the fact. So this hack is updated to account for this appending,
> > by reserving some bytes.
> >
> > Fixup-by: Michael S. Tsirkin <mst@redhat.com>
> > Cc: x86@kernel.org
> > Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Eric Biggers <ebiggers@kernel.org>
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
> > Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Tested-by: Eric Biggers <ebiggers@google.com>
> > Tested-by: Mathias Krause <minipli@grsecurity.net>
>
> This one should wind up in the stable point release too. Dunno what the
> procedure for that is.
>
> Jason
If you want that you need to include
Cc: qemu-stable@nongnu.org
Fixes: <hash> ("subject")
you can still reply to the original mail with this.
--
MST
- [PULL 02/56] hw/i386/acpi-build: Remove unused attributes, (continued)
- [PULL 02/56] hw/i386/acpi-build: Remove unused attributes, Michael S. Tsirkin, 2023/01/30
- [PULL 03/56] hw/isa/isa-bus: Turn isa_build_aml() into qbus_build_aml(), Michael S. Tsirkin, 2023/01/30
- [PULL 05/56] hw/acpi/acpi_dev_interface: Remove unused parameter from AcpiDeviceIfClass::madt_cpu, Michael S. Tsirkin, 2023/01/30
- [PULL 04/56] hw/acpi/piix4: No need to #include "hw/southbridge/piix.h", Michael S. Tsirkin, 2023/01/30
- [PULL 06/56] vhost-user: Correct a reference of TARGET_AARCH64, Michael S. Tsirkin, 2023/01/30
- [PULL 07/56] hw/pci-host: Use register definitions from PCI standard, Michael S. Tsirkin, 2023/01/30
- [PULL 08/56] virtio-rng-pci: fix migration compat for vectors, Michael S. Tsirkin, 2023/01/30
- [PULL 09/56] intel-iommu: Document iova_tree, Michael S. Tsirkin, 2023/01/30
- [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data, Michael S. Tsirkin, 2023/01/30
- [PULL 13/56] tests: acpi: whitelist DSDT blobs for tests that use pci-bridges, Michael S. Tsirkin, 2023/01/30
- [PULL 12/56] tests: acpi: cleanup arguments to make them more readable, Michael S. Tsirkin, 2023/01/30
- [PULL 14/56] tests: acpi: extend pcihp with nested bridges, Michael S. Tsirkin, 2023/01/30
- [PULL 15/56] tests: acpi: update expected blobs, Michael S. Tsirkin, 2023/01/30
- [PULL 17/56] pci_bridge: remove whitespace, Michael S. Tsirkin, 2023/01/30
- [PULL 16/56] tests: acpi: cleanup use_uefi argument usage, Michael S. Tsirkin, 2023/01/30
- [PULL 20/56] pcihp: piix4: do not call acpi_pcihp_reset() when ACPI PCI hotplug is disabled, Michael S. Tsirkin, 2023/01/30