[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PULL 61/73] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register
|
From: |
Jonathan Cameron |
|
Subject: |
Re: [PULL 61/73] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register |
|
Date: |
Wed, 3 May 2023 10:31:59 +0100 |
On Wed, 3 May 2023 00:08:55 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, May 02, 2023 at 09:32:34PM -0300, Leonardo Brás wrote:
> > Hello Michael, Juan, Peter,
> >
> > On Wed, 2023-04-26 at 09:19 +0200, Juan Quintela wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Tue, Apr 25, 2023 at 08:42:17PM -0400, Peter Xu wrote:
> > > > > Hi, Michael, Jonathan,
> > > > >
> > > > > On Tue, Mar 07, 2023 at 08:13:53PM -0500, Michael S. Tsirkin wrote:
> > > > > This breaks the simplest migration from QEMU 8.0->7.2 binaries on all
> > > > > machine types I think as long as the cap is present, e.g. the default
> > > > > e1000e provided by the default q35 machine can already hit it with all
> > > > > default cmdline:
> > > > >
> > > > > ./qemu-system-x86_64 -M pc-q35-7.2 [-incoming XXX]
> > > > >
> > > > > 7.2 binary will have empty wmask for PCI_ERR_UNCOR_MASK, meanwhile I
> > > > > think
> > > > > it can also see a non-zero value, then the migration will fail at:
> > > > >
> > > > > vmstate_load 0000:00:02.0/e1000e, e1000e
> > > > >
> > > > > qemu-7.2: get_pci_config_device: Bad config data: i=0x10a read: 40
> > > > > device: 0 cmask: ff wmask: 0 w1cmask:0
> > > > > qemu-7.2: Failed to load PCIDevice:config
> > > > > qemu-7.2: Failed to load e1000e:parent_obj
> > > > >
> > > > > qemu-7.2: error while loading state for instance 0x0 of device
> > > > > '0000:00:02.0/e1000e'
> > > > > qemu-7.2: load of migration failed: Invalid argument
> > > > >
> > > > > We probably at least want to have the default value to be still zero,
> > > > > and
> > > > > we'd need to make sure it'll not be modified by the guest, iiuc.
> > > > >
> > > > > Below oneliner works for me and makes the migration work again:
> > > > >
> > > > > ===8<===
> > > > > diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
> > > > > index 103667c368..563a37b79c 100644
> > > > > --- a/hw/pci/pcie_aer.c
> > > > > +++ b/hw/pci/pcie_aer.c
> > > > > @@ -113,7 +113,7 @@ int pcie_aer_init(PCIDevice *dev, uint8_t
> > > > > cap_ver, uint16_t offset,
> > > > > pci_set_long(dev->w1cmask + offset + PCI_ERR_UNCOR_STATUS,
> > > > > PCI_ERR_UNC_SUPPORTED);
> > > > > pci_set_long(dev->config + offset + PCI_ERR_UNCOR_MASK,
> > > > > - PCI_ERR_UNC_MASK_DEFAULT);
> > > > > + 0/*PCI_ERR_UNC_MASK_DEFAULT*/);
> > > > > pci_set_long(dev->wmask + offset + PCI_ERR_UNCOR_MASK,
> > > > > PCI_ERR_UNC_SUPPORTED);
> > > > > ===8<===
> > > > >
> > > > > Anyone could have a look on a solid solution from PCI side?
> > > > >
> > > > > Copy Juan and Leonardo.
> > > > >
> > > > > Thanks,
> > > >
> > > > My bad, I forgot about this 🤦.
> > > > So we need a property and tweak it with compat machinery depending on
> > > > machine type. Jonathan, can you work on this pls?
> > > > Or I can revert for now to relieve the time pressure,
> > > > redo the patch at your leasure.
> > >
> > > I agree with Michael here, the best option is adding a new property.
> > >
> > > Later, Juan.
> > >
> >
> > I sent a patch implementing the suggested fix:
> > 20230503002701.854329-1-leobras@redhat.com/T/#u">https://lore.kernel.org/qemu-devel/20230503002701.854329-1-leobras@redhat.com/T/#u
> >
> > Please let me know of anything to improve.
> >
> > Best regards,
> > Leo
>
> Weird, didn't get it for some reason. Pulled it from lore now, thanks!
>
Thanks all. Sorry for lack of reply, crazy week at a conference, so I wasn't
successfully keeping up with email and still working through backlog.
Obviously I forgot about migration across versions. Sorry!
The fix Leo posted looks good to me. Given in theory a previously loaded
driver /firmware could have set these all to 0, any driver code should not
be assuming they take the defaults in the PCI spec. Might make a difference
to any testing using errors injected very early (e.g. before drivers load)
but meh, that's a corner case no one has hit previously so I doubt they ever
will.
Thanks to all involved.
Jonathan