qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume notification


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume notification handler for aer resume
Date: Tue, 12 Jul 2016 09:45:43 -0600

On Tue, 12 Jul 2016 09:42:21 +0800
Zhou Jie <address@hidden> wrote:

> Hi Alex,
> 
> >>> The variable clearly isn't visible to the user, so the user can know
> >>> whether the kernel supports this feature, but not whether the feature
> >>> is currently active.  Perhaps there's no way to avoid races completely,
> >>> but don't you expect that if we define that certain operations are
> >>> blocked after an error notification that a user may want some way to
> >>> poll for whether the block is active after a reset rather than simply
> >>> calling a blocked interface to probe for it?  
> >> Yes, I will use access blocked function, not the variable.  
> >
> > I don't understand what this means.  
> I will use workable state support flag
> to let user know whether the kenerl support block feature.
> And make configure space writing and ioctl function blocked.

And what of my suggestion that a user may desire to poll the state of
the device?
 
> > Is that acceptable?  This is part of the problem I have with silently
> > disabling interrupt delivery via the command register across reset.  It
> > seems more non-deterministic than properly disabling interrupts and
> > requiring the user to reinitialize them after error.  
> What is "properly disabling interrupts"?

We've discussed this, it's tearing down the eventfds and the irq
handlers using the same mechanism as used in vfio_pci_disable().

> User don't know what vfio driver has done.
> What's the difference of "disabling interrupt delivery via
> the command register" and "doing a teardown of interrupts"?
> The interrupts will still lost silently.

A user does know what the vfio driver has done if you define the
behavior that on an AER error reported event, as signaled to the user
via the error notification interrupt, vfio-pci will teardown device
interrupts to an uninitialized state.  The difference between the
command register approach you suggest and the teardown I suggest is
that the command register is simply masking interrupt deliver while the
teardown approach returns the device to an uninitialized interrupt
state.  Take a look at the device state when a bus reset occurs, what
state is saved and restored and what is left at a default PCI value.
The command register is saved and restored, so any manipulation we do
of it is racing the host kernel AER handling and bus reset.  What about
MSI and MSI-X?  Looks to me like those are left at the PCI default
initialization state, so now after an AER error we have irq handlers
and eventfds configured, while in fact the device has been
de-programmed.  To handle that we're expecting users to teardown the
interrupt state and re-establish it?  Again, why not just teardown the
interrupt state ourselves?  I dont' see the value in simply masking the
command register, especially when it doesn't handle the no-DisINTx case.

> >>> Or does it make more sense to
> >>> simply disable the interrupts as done in vfio_pci_disable() and define
> >>> that the user needs to re-establish interrupts before continuing after
> >>> an error event?  Thanks,  
> >> If user invoked the vfio_pci_disable by ioctl function.  
> >
> > I'm in no way suggesting that a user invoke vfio_pci_disable(), I'm just
> > trying to point out that vfio_pci_disable() already does a teardown of
> > interrupts, similar to what seems to be required here.
> >  
> >> Yes, user should re-establish interrupts before
> >> continuing after an error event.  
> >
> > So if we define that users should re-establish interrupts after an
> > error event, then what's the point of only doing command register
> > masking of the interrupts and requiring the user to both tear-down the
> > interrupts and re-establish them?  Thanks,  
> Take "Intel(R) 10GbE PCI Express Virtual Function" as an example.
> In drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> static const struct pci_error_handlers ixgbevf_err_handler = {
>          .error_detected = ixgbevf_io_error_detected,
>          .slot_reset = ixgbevf_io_slot_reset,
>          .resume = ixgbevf_io_resume,
> };
> User tear-down the interrupts in ixgbevf_io_error_detected function.
> And up the interrupts in ixgbevf_io_resume.
> 
> Guest OS driver will do both tear-down the interrupts
> and re-establish them.
> Because it don't know what host vfio driver has done.
> 
> I disable the interrupts to pretend them interfere with device reset.

We cannot depend on the behavior of any given driver and the fact that
the guest driver may teardown interrupts anyway is not a justification
that vfio shouldn't be doing this to make the device state presented to
the user consistent.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]