qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table


From: Gavin Shan
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
Date: Thu, 26 Mar 2015 12:30:43 +1100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Mar 26, 2015 at 12:10:52PM +1100, David Gibson wrote:
>On Thu, Mar 26, 2015 at 11:53:48AM +1100, Gavin Shan wrote:
>> On Tue, Mar 24, 2015 at 06:53:29AM -0600, Alex Williamson wrote:
>> >On Tue, 2015-03-24 at 17:54 +1100, David Gibson wrote:
>> >> On Tue, Mar 24, 2015 at 05:24:55PM +1100, Gavin Shan wrote:
>> >> > On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
>> >> > >On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
>> >> > >> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
>> >> > >> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
>> >> > >> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
>> >> > >> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
>> >> > >> >> >> The PCI device MSIx table is cleaned out in hardware after EEH 
>> >> > >> >> >> PE
>> >> > >> >> >> reset. However, we still hold the stale MSIx entries in QEMU, 
>> >> > >> >> >> which
>> >> > >> >> >> should be cleared accordingly. Otherwise, we will run into 
>> >> > >> >> >> another
>> >> > >> >> >> (recursive) EEH error and the PCI devices contained in the PE 
>> >> > >> >> >> have
>> >> > >> >> >> to be offlined exceptionally.
>> >> > >> >> >> 
>> >> > >> >> >> The patch clears stale MSIx table before EEH PE reset so that 
>> >> > >> >> >> MSIx
>> >> > >> >> >> table could be restored properly after EEH PE reset.
>> >> > >> >> >> 
>> >> > >> >> >> Signed-off-by: Gavin Shan <address@hidden>
>> >> > >> >> >> ---
>> >> > >> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and 
>> >> > >> >> >> separate
>> >> > >> >> >>     error message for this function. Dropped vfio_put_group()
>> >> > >> >> >>     on NULL group
>> >> > >> >> >> ---
>> >> > >> >> >>  hw/vfio/Makefile.objs  |  6 +++++-
>> >> > >> >> >>  hw/vfio/common.c       |  7 +++++++
>> >> > >> >> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>> >> > >> >> >>  hw/vfio/pci.c          | 38 
>> >> > >> >> >> ++++++++++++++++++++++++++++++++++++++
>> >> > >> >> >>  include/hw/vfio/vfio.h |  2 ++
>> >> > >> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
>> >> > >> >> >>  create mode 100644 hw/vfio/pci-stub.c
>> >> > >> >> >> 
>> >> > >> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> >> > >> >> >> index e31f30e..1b8a065 100644
>> >> > >> >> >> --- a/hw/vfio/Makefile.objs
>> >> > >> >> >> +++ b/hw/vfio/Makefile.objs
>> >> > >> >> >> @@ -1,4 +1,8 @@
>> >> > >> >> >>  ifeq ($(CONFIG_LINUX), y)
>> >> > >> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
>> >> > >> >> >> -obj-$(CONFIG_PCI) += pci.o
>> >> > >> >> >> +ifeq ($(CONFIG_PCI), y)
>> >> > >> >> >> +obj-y += pci.o
>> >> > >> >> >> +else
>> >> > >> >> >> +obj-y += pci-stub.o
>> >> > >> >> >> +endif
>> >> > >> >> >>  endif
>> >> > >> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> >> > >> >> >> index 148eb53..ed07814 100644
>> >> > >> >> >> --- a/hw/vfio/common.c
>> >> > >> >> >> +++ b/hw/vfio/common.c
>> >> > >> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace 
>> >> > >> >> >> *as, int32_t groupid,
>> >> > >> >> >>      switch (req) {
>> >> > >> >> >>      case VFIO_CHECK_EXTENSION:
>> >> > >> >> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> >> > >> >> >> +        break;
>> >> > >> >> >>      case VFIO_EEH_PE_OP:
>> >> > >> >> >> +        if (vfio_container_eeh_event(as, groupid, param) != 
>> >> > >> >> >> 0) {
>> >> > >> >> >
>> >> > >> >> >I really dislike the idea of having an arbitrarily complex side 
>> >> > >> >> >effect
>> >> > >> >> >from a function whose name suggest's it's just a trivial wrapper
>> >> > >> >> >around the ioctl().
>> >> > >> >> >
>> >> > >> >> 
>> >> > >> >> Ok. I guess you would like putting the complex in the callers of
>> >> > >> >> vfio_container_ioctl().
>> >> > >> >
>> >> > >> >Well.. maybe.  I'd also be happy if helper functions were 
>> >> > >> >implemeneted
>> >> > >> >which both called the ioctl() and did the other necessary pieces.
>> >> > >> >They should just be called something that indicates their full
>> >> > >> >function, not a name which suggests they're just an ioctl wrapper.
>> >> > >> >
>> >> > >> 
>> >> > >> Indeed, vfio_container_ioctl() isn't indicating what the function is 
>> >> > >> doing.
>> >> > >> How about renaming it to vfio_container_event_and_ioctl()? I'm 
>> >> > >> always bad
>> >> > >> at giving a good function name :)
>> >> > >
>> >> > >Well, I don't think your wrapper should be multiplexed.  The multiplex
>> >> > >works for the simple ioctl() wrapper, because there really is nothing
>> >> > >that varies apart from the exact ioctl number called.
>> >> > >
>> >> > >But now that you have different operations here, I think you want
>> >> > >wrappers for each one - each one will call the ioctl(), then do the
>> >> > >specific extra steps necessary for that operation.  So
>> >> > >vfio_container_event() will go away as well, split into various other
>> >> > >functions.
>> >> > >
>> >> > 
>> >> > It wouldn't a good idea if I understand your proposal correctly. 
>> >> > Currnetly,
>> >> > the global function vfio_container_ioctl() can be called from sPAPR 
>> >> > platform
>> >> > for any ioctl commands handled in kernel source file 
>> >> > vfio_iommu_spapr_tce.c,
>> >> > which means the function isn't called for EEH only. Other sPAPR TCE 
>> >> > container
>> >> > ioctl commands are also routed by this function. There will be lots if 
>> >> > having
>> >> > one global function for each ioctl commands, which just improve the 
>> >> > cost to
>> >> > maintain the code.
>> >> 
>> >> I don't really follow your objection.  I'm only suggesting separate
>> >> wrappers for things which require extra actions currently implemented
>> >> in vfio_container_event().  Things which only ned the plain ioctl()
>> >> can still use the simple vfio_container_ioctl() wrapper.
>> >
>> >vfio_container_ioctl() also filters to a limited set of ioctls, it
>> >clearly does not allow any ioctl.
>> >
>> 
>> Ok. I think your guys expect something like follows. Note that the following
>> vfio_container_eeh_ioctl() will accept a limited set of EEH operations, 
>> similar
>> to what's doing in vfio_contain_ioctl() to the ioctl commands:
>> 
>> If you agree to have the changes, I'll put another patch on top of this one
>> to replace vfio_container_ioctl() in spapr_pci_vfio.c with 
>> vfio_container_eeh_ioctl()
>> for EEH cases.
>> 
[snip ...]
>
>No, extra operation specific logic inside the ioctl wrapper is exactly
>what I want to avoid.  Instead I want to see
>vfio_container_eeh_ioctl() remain as it is now - doing nothing but
>verifying the ioctl() number, then passing the arguments on to
>ioctl().
>

I think you were talking about vfio_container_ioctl() :)

>What I'm expecting is then to add a new functions, along the lines of:
>
>int vfio_eeh_pe_reset(...)
>{
>    VFIOGroup *group;
>    VFIODevice *vbasedev;
>    VFIOPCIDevice *vdev;
>
>    /*
>     * The MSIx table will be cleaned out by reset. We need
>     * disable it so that it can be reenabled properly. Also,
>     * the cached MSIx table should be cleared as it's not
>     * reflecting the contents in hardware.
>     */
>    group = vfio_get_group(groupid, as);
>    if (!group) {
>        error_report("vfio: group %d not found\n", groupid);
>        return -1;
>    }
>
>    QLIST_FOREACH(vbasedev, &group->device_list, next) {
>        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>        if (msix_enabled(&vdev->pdev)) {
>            vfio_disable_msix(vdev);
>        }
>
>        msix_reset(&vdev->pdev);
>    }
>
>    vfio_put_group(group);
>
>    return vfio_eeh_container_ioctl(as, groupid,
>                                    VFIO_EEH_PE_RESET_FUNDAMENTAL, op);
>}
>
>I this function can build the op structure itself from sensible
>arguments, then that's even better.
>

Thanks, David. I assume that Alex doesn't object to this and I'll
change the code according to your suggestion. I'll send next revision
soon.

Thanks,
Gavin

>-- 
>David Gibson                   | I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au | minimalist, thank you.  NOT _the_ _other_
>                               | _way_ _around_!
>http://www.ozlabs.org/~dgibson





reply via email to

[Prev in Thread] Current Thread [Next in Thread]