[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM vi
From: |
Shameerali Kolothum Thodi |
Subject: |
RE: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support) |
Date: |
Tue, 24 Sep 2019 16:38:36 +0000 |
> -----Original Message-----
> From: Laszlo Ersek [mailto:address@hidden]
> Sent: 24 September 2019 16:53
> To: Shameerali Kolothum Thodi <address@hidden>;
> Igor Mammedov <address@hidden>
> Cc: Auger Eric <address@hidden>; address@hidden;
> address@hidden; address@hidden; address@hidden;
> xuwei (O) <address@hidden>; Linuxarm <address@hidden>; Ard
> Biesheuvel <address@hidden>; Leif Lindholm (Linaro address)
> <address@hidden>
> Subject: Re: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4]
> ARM virt: ACPI memory hotplug support)
>
> On 09/20/19 19:04, Shameerali Kolothum Thodi wrote:
> > Hi Laszlo/Igor,
> >
> > I spend some time to debug this further as I was rebasing the nvdimm
> > hot-add support patches on top of the ongoing pc-dimm hot add ones.
> >
> > Just to refresh the memory:
> >
> > https://patchwork.kernel.org/cover/10783589/
> >
> > " It is observed that hot adding nvdimm will results in guest reboot
> > failure. EDK2 fails to build the ACPI tables on reboot. Please find
> > below EDK2 log on Guest reboot after nvdimm hot-add,
> >
> > ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
> > OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > "
> >
> > Please find below,
> >
> >> -----Original Message-----
> >> From: Laszlo Ersek [mailto:address@hidden]
> >> Sent: 05 March 2019 12:15
> >> To: Igor Mammedov <address@hidden>
> >> Cc: Shameerali Kolothum Thodi <address@hidden>;
> >> Auger Eric <address@hidden>; address@hidden;
> >> address@hidden; address@hidden;
> address@hidden;
> >> xuwei (O) <address@hidden>; Linuxarm <address@hidden>; Ard
> >> Biesheuvel <address@hidden>; Leif Lindholm (Linaro
> >> address) <address@hidden>
> >> Subject: Re: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
> >>
> >> On 03/01/19 18:39, Igor Mammedov wrote:
> >>> On Fri, 1 Mar 2019 14:49:45 +0100
> >>> Laszlo Ersek <address@hidden> wrote:
> >>>
> >>>> On 02/28/19 15:02, Shameerali Kolothum Thodi wrote:
> >>>>
> >>>>> Ah..I missed the fact that, firmware indeed sees an update in the
> >>>>> blob len here (rounded or not) after reboot. So don't think x86
> >>>>> has the same issue and padding is not the right solution as Igor
> >>>>> explained in his reply.
> >>>>>
> >>>>> I will try to debug this further. Any pointers welcome.
> >>>>
> >>>> How about this.
> >>>>
> >>>> (1) The firmware looks up the fw_cfg file called "etc/table-loader"
> >>>> in the fw_cfg file directory (identified by constant selector key
> >>>> 0x0019, FW_CFG_FILE_DIR).
> >>>>
> >>>> (2) The directory entry, once found, tells the firmware two things
> >>>> simultaneously. The selector key, and the size of the blob.
> >>>>
> >>>> (3) The firmware selects the selector key from step (2).
> >>>>
> >>>> (4) QEMU regenerates the ACPI payload (as a select callback).
> >>>>
> >>>> (5) The firmware reads the number of bytes from the fw_cfg blob
> >>>> that it learned in step (2).
> >>>>
> >>>> Here's the problem. As long as QEMU used to perform step (4) only
> >>>> for the purpose of refreshing PCI resources in the ACPI payload,
> >>>> step (4) wouldn't *resize* the blob.
> >>>>
> >>>> However, if step (4) enlarges the blob, then the byte count that
> >>>> step (5) uses -- from step (2) -- for reading, is obsolete.
> >>
> >>> I've thought that was a problem with IO based fw_cfg, as reading
> >>> size/content were separates steps and that it was solved by DMA
> >>> based fw_cfg file read.
> >>
> >> The DMA backend is not relevant for this question, for two reasons:
> >>
> >> (a) The question whether the fw_cfg transfer takes places with port
> >> IO vs. DMA is hidden from the fw_cfg client code; that code goes
> >> through an abstract library API.
> >>
> >> (b) While the DMA method indeed lets the firmware specify the details
> >> of the transfer with one action, the issue is with the number of
> >> bytes that the firmware requests (that is, not with *how* the
> >> firmware requests the transfer). The firmware has to know the size of
> >> the transfer before it can initiate the transfer (regardless of port
> >> IO vs. DMA).
> >>
> >>
> >> My question is: assume the firmware item in question is selected, and
> >> the QEMU-side select callback runs (regenerating the ACPI payload).
> >> Does this action update the blob size in the fw_cfg file directory as
> >> well?
> >
> > I think it doesn't update the blob size on select callback which is
> > the root cause of this issue. And the reason looks like,
> > qemu_ram_resize() function returns without invoking the callback to
> > update the blob size.
> >
> > On boot up, Qemu builds the table and exposes it to guest,
> > virt_acpi_setup()
> > acpi_add_rom_blob()
> > rom_add_blob()
> > rom_set_mr() --> mr is allocated here and ram_block
> used_length = HOST_PAGE_ALIGN(blob size);
> > fw_cfg_add_file_callback()
> > fw_cfg_add_bytes_callback() --> This uses the blob size
> passed into it.
> >
> > On select callback path,
> >
> > virt_acpi_build_update()
> > acpi_ram_update()
> > memory_region_ram_resize()
> > qemu_ram_resize() -->. Here the newsize gets aligned to HOST_PAGE
> and callback is only called used_length != newsize.
> >
> > https://github.com/qemu/qemu/blob/master/exec.c#L2180
> >
> > Debug logs:
> > Initial boot:
> > ##QEMU_DEBUG## rom_add_blob: file etc/acpi/tables size 0x64f7
> > ##QEMU_DEBUG## fw_cfg_add_bytes_callback: key 0x21 len 0x64f7
> > ........
> > ........
> > ###UEFI#### InstallQemuFwCfgTables: "etc/table-loader" has FwCfgItem
> 0x27 size 0xD00
> > ##QEMU_DEBUG## virt_acpi_build_update:
> > ##QEMU_DEBUG## acpi_ram_update: size 0x64f7
> > ##QEMU_DEBUG## qemu_ram_resize: idstr /rom@etc/acpi/tables
> used_length 0x7000 newsize 0x7000 --> No callback.
> > .....
> > ######UEFI###### ProcessCmdAllocate:
> QemuFwCfgFindFile("etc/acpi/tables"): size 0x64F7 --> UEFI get the actual
> size,
> which is fine for now.
> >
> > Hot-add nvdimms and reboot.
> >
> > root@ubuntu:/# reboot
> > .......
> > ........
> > ###UEFI#### InstallQemuFwCfgTables: "etc/table-loader" has FwCfgItem
> 0x27 size 0xD00
> > ##QEMU_DEBUG## virt_acpi_build_update:
> > ##QEMU_DEBUG## acpi_ram_update: size 0x6667 --> Size changed.
> > ##QEMU_DEBUG## qemu_ram_resize: idstr /rom@etc/acpi/tables
> used_length 0x7000 newsize 0x7000 --> newsize is still aligned to 0x7000 and
> no callback to update.
> > ......
> > ######UEFI###### ProcessCmdAllocate:
> QemuFwCfgFindFile("etc/acpi/tables"): size 0x64F7 -->UEFI still sees the old
> value and fails.
> >
> > This can be fixed by,
> >
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index
> f3bd45675b..79da3fd35d 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -854,6 +854,9 @@ void virt_acpi_build(VirtMachineState *vms,
> AcpiBuildTables *tables)
> > build_rsdp(tables->rsdp, tables->linker, &rsdp_data);
> > }
> >
> > + g_array_set_size(tables_blob,
> > +
> TARGET_PAGE_ALIGN(acpi_data_len(tables_blob)));
> > +
> > /* Cleanup memory that's no longer used. */
> > g_array_free(table_offsets, true);
> > }
> >
> > But I am not sure this is the best to way fix this issue (Or I am
> > missing something here).
> >
> > Please let me know.
>
> The above QEMU patch, for virt_acpi_build(), may be necessary, but I
> don't think it is sufficient.
>
> For the firmware to see the updated (enlarged) blob, two things are
> required:
>
> (a) QEMU to update the blob size in the fw_cfg directory entry.
>
> Note: to the firmware, it is totally irrelevant if QEMU updates some
> *other* value or field that reflects the fresh blob size. The only thing
> the firmware can see is the entry in the FW_CFG_FILE_DIR blob.
>
> To illustrate the field I'm referring to, see:
>
> s->files->f[index].size = cpu_to_be32(len);
>
> in fw_cfg_add_file_callback().
>
> See also:
>
> s->files->f[i].size = cpu_to_be32(len);
>
> in fw_cfg_modify_file().
>
> That "size" field is what the firmware can see.
Agree. And what I am trying to illustrate above is why this update is not
happening.
The Qemu path on select key from firmware is(I think),
fw_cfg_select()
e->select_cb() -->Callback registered in virt_acpi_setup() -->
acpi_add_rom_blob() ..... -> fw_cfg_add_file_callback()
virt_acpi_build_update()
acpi_ram_update()
memory_region_ram_resize()
qemu_ram_resize()
block->resized() --> Callback registered in virt_acpi_setup() ->
acpi_add_rom_blob()-> rom_add_blob() -> rom_set_mr()
fw_cfg_resized()
fw_cfg_modify_file()
s->files->f[i].size = cpu_to_be32(len); Updates the size for firmware
But as I was trying to explain above, the callback function fw_cfg_resized() is
not
invoked in qemu_ram_resize()because while adding the mr region(rom_set_mr()),
it uses the aligned size. And as a result if the updated/modified blob size is
not greater
than the aligned size the qemu_ram_resize() returns immediately.
int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
{
assert(block);
newsize = HOST_PAGE_ALIGN(newsize);
if (block->used_length == newsize) {
return 0;
}
...
if (block->resized) {
block->resized(block->idstr, newsize, block->host); ---> registered
callback, fw_cfg_resized()
}
return 0;
}
> Note: *all* relevant fw_cfg files must have their "size" fields updated
> in the directory blob (FW_CFG_FILE_DIR). I.e. the requirement applies to
> both the linker-loader script, and to all blobs that are referenced (by
> name) by the commands in the linker-loader script.
>
>
> (b) The firmware to re-read the size from the directory, after selecting
> the key for the sake of ACPI regeneration.
Hmm...I am not sure this is required. In my testing with the above fix I
mentioned,
I can see that firmware is reading the correct(modified) size on select.
I will double check this though.
Thanks,
Shameer
> I wrote:
>
> >> If it does, then I can work around the problem in the firmware. I can
> >> add a re-lookup to the code after the item selection, in order to get
> >> the fresh blob size from the fw_cfg file directory. Then we can use
> >> that size for the actual transfer.
> >>
> >> This won't help old firmware on new QEMU, but at least new firmware
> >> on old QEMU will not be hurt (the re-fetching of the fw_cfg file
> >> directory will come with a small performance penalty, but
> >> functionally it will be a no-op).
>
> Thus, the firmware patch in question would be:
>
> | diff --git a/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> b/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> | index bc1a891dbaf1..07f70ffe158a 100644
> | --- a/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> | +++ b/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
> | @@ -975,6 +975,24 @@ InstallQemuFwCfgTables (
> | ORDERED_COLLECTION *SeenPointers;
> | ORDERED_COLLECTION_ENTRY *SeenPointerEntry, *SeenPointerEntry2;
> |
> | + Status = QemuFwCfgFindFile ("etc/table-loader", &FwCfgItem,
> &FwCfgSize);
> | + if (EFI_ERROR (Status)) {
> | + return Status;
> | + }
> | +
> | + //
> | + // By selecting "FwCfgItem", ask QEMU to regenerate the ACPI payload,
> with
> | + // all PCI devices decoding their resources. Note: further selections
> | + // of the same key will not repeat the patching.
> | + //
> | + EnablePciDecoding (&OriginalPciAttributes, &OriginalPciAttributesCount);
> | + QemuFwCfgSelectItem (FwCfgItem);
> | + RestorePciDecoding (OriginalPciAttributes, OriginalPciAttributesCount);
> | +
> | + //
> | + // The size of the script may have changed, possibly due to platform
> devices
> | + // having been hot-plugged before platform reset. Re-read the size.
> | + //
> | Status = QemuFwCfgFindFile ("etc/table-loader", &FwCfgItem,
> &FwCfgSize);
> | if (EFI_ERROR (Status)) {
> | return Status;
> | @@ -989,10 +1007,8 @@ InstallQemuFwCfgTables (
> | if (LoaderStart == NULL) {
> | return EFI_OUT_OF_RESOURCES;
> | }
> | - EnablePciDecoding (&OriginalPciAttributes, &OriginalPciAttributesCount);
> | QemuFwCfgSelectItem (FwCfgItem);
> | QemuFwCfgReadBytes (FwCfgSize, LoaderStart);
> | - RestorePciDecoding (OriginalPciAttributes, OriginalPciAttributesCount);
> | LoaderEnd = LoaderStart + FwCfgSize / sizeof *LoaderEntry;
> |
> | AllocationsRestrictedTo32Bit = NULL;
>
> But, again, this only makes sense if QEMU implements (a).
>
> Thanks
> Laszlo