qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 15/19] vfio-user: handle device interrupts


From: Jag Raman
Subject: Re: [PATCH v6 15/19] vfio-user: handle device interrupts
Date: Tue, 29 Mar 2022 19:06:46 +0000


> On Mar 29, 2022, at 10:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Sat, Mar 26, 2022 at 11:47:36PM +0000, Jag Raman wrote:
>> 
>> 
>>> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> 
>>> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
>>>> Forward remote device's interrupts to the guest
>>>> 
>>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>>> ---
>>>> include/hw/pci/pci.h              |   6 ++
>>>> include/hw/remote/vfio-user-obj.h |   6 ++
>>>> hw/pci/msi.c                      |  13 +++-
>>>> hw/pci/msix.c                     |  12 +++-
>>>> hw/remote/machine.c               |  11 +--
>>>> hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
>>>> stubs/vfio-user-obj.c             |   6 ++
>>>> MAINTAINERS                       |   1 +
>>>> hw/remote/trace-events            |   1 +
>>>> stubs/meson.build                 |   1 +
>>>> 10 files changed, 158 insertions(+), 6 deletions(-)
>>>> create mode 100644 include/hw/remote/vfio-user-obj.h
>>>> create mode 100644 stubs/vfio-user-obj.c
>>>> 
>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>>>> index c3f3c90473..d42d526a48 100644
>>>> --- a/include/hw/pci/pci.h
>>>> +++ b/include/hw/pci/pci.h
>>>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
>>>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
>>>>                                pcibus_t addr, pcibus_t size, int type);
>>>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
>>>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
>>>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
>>>> 
>>>> typedef struct PCIIORegion {
>>>>    pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
>>>> @@ -323,6 +325,10 @@ struct PCIDevice {
>>>>    /* Space to store MSIX table & pending bit array */
>>>>    uint8_t *msix_table;
>>>>    uint8_t *msix_pba;
>>>> +
>>>> +    PCIMSINotify *msi_notify;
>>>> +    PCIMSIxNotify *msix_notify;
>>>> +
>>>>    /* MemoryRegion container for msix exclusive BAR setup */
>>>>    MemoryRegion msix_exclusive_bar;
>>>>    /* Memory Regions for MSIX table and pending bit entries. */
>>>> diff --git a/include/hw/remote/vfio-user-obj.h 
>>>> b/include/hw/remote/vfio-user-obj.h
>>>> new file mode 100644
>>>> index 0000000000..87ab78b875
>>>> --- /dev/null
>>>> +++ b/include/hw/remote/vfio-user-obj.h
>>>> @@ -0,0 +1,6 @@
>>>> +#ifndef VFIO_USER_OBJ_H
>>>> +#define VFIO_USER_OBJ_H
>>>> +
>>>> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
>>>> +
>>>> +#endif
>>>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
>>>> index 47d2b0f33c..93f5e400cc 100644
>>>> --- a/hw/pci/msi.c
>>>> +++ b/hw/pci/msi.c
>>>> @@ -51,6 +51,8 @@
>>>> */
>>>> bool msi_nonbroken;
>>>> 
>>>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
>>>> +
>>>> /* If we get rid of cap allocator, we won't need this. */
>>>> static inline uint8_t msi_cap_sizeof(uint16_t flags)
>>>> {
>>>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>>>    dev->msi_cap = config_offset;
>>>>    dev->cap_present |= QEMU_PCI_CAP_MSI;
>>>> 
>>>> +    dev->msi_notify = pci_msi_notify;
>>> 
>>> Are you sure it's correct to skip the msi_is_masked() logic? I think the
>>> callback function should only override the behavior of
>>> msi_send_message(), not the entire msi_notify() function.
>>> 
>>> The same applies to MSI-X.
>> 
>> Hi Stefan,
>> 
>> We noticed that the client is handling the masking and unmasking of MSIx
>> interrupts.
>> 
>> Concerning MSIx, vfio_msix_vector_use() handles unmasking and
>> vfio_msix_vector_release() handles masking operations. The server triggers
>> an MSIx interrupt by signaling the eventfd associated with the vector. If 
>> the vector
>> is unmasked, the interrupt bypasses the client/QEMU and takes this
>> path: “server -> KVM -> guest”. Whereas, if the vector is masked, it lands 
>> on the
>> client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt() 
>> suppresses the
>> interrupt if the vector is masked. The use and release functions switch the 
>> server’s
>> eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and
>> VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the
>> VFIO_DEVICE_SET_IRQS message.
>> 
>> Concerning MSI, the server should check if the vector is unmasked before
>> triggering. The server is not doing it presently, will update it. For some 
>> reason,
>> I had assumed that MSI handling is similar to MSIx in terms of masking - 
>> sorry
>> about that. The masking and unmasking information for MSI is in the config 
>> space
>> registers, so the server should have this information.
>> 
>> You had previously suggested using callbacks for msi_get_message &
>> msi_send_message, considering the masking issue. Given MSIx masking
>> (including the MSIx table BAR) is handled at the client, the masking 
>> information
>> doesn’t reach the server - so msix_notify will never invoke the
>> msi_send_message callback - all the vectors remain masked at the server
>> end (msix_init() -> msix_mask_all()).
> 
> I was expecting vfio-user devices to be involved in MSI-X masking so
> they can implement the Pending bit semantics described in the spec:
> 
>  If a masked vector has its Pending bit Set, and the associated
>  underlying interrupt events are somehow satisfied (usually by software
>  though the exact manner is Function-specific), the Function must Clear
>  the Pending bit, to avoid sending a spurious interrupt message later
>  when software unmasks the vector.
> 
> Does QEMU VFIO support this?

QEMU VFIO doesn’t seem to support it - I couldn’t find a place where
an assigned/passthru PCI device clears the pending bits in QEMU.

> 
> What happens when a hw/net/e1000e_core.c vfio-user device uses
> msix_clr_pending() and related APIs?
> 
> Also, having the vfio-user daemon write to the eventfd while the vector
> is masked is a waste of CPU cycles. The PCIe spec describes using MSI-X
> masking for poll mode operation and going via eventfd is suboptimal:
> 
>  Software is permitted to mask one or more vectors indefinitely, and
>  service their associated interrupt events strictly based on polling
>  their Pending bits. A Function must Set and Clear its Pending bits as
>  necessary to support this “pure polling” mode of operation.
> 
> Maybe the answer is these issues don't matter in practice because MSI-X
> masking is not used much?

From what I can tell, “pure polling” is used by ivshmem and virtio-pci devices 
in QEMU.

e1000e doesn’t use “pure polling”, but it does clear pending interrupts.

John Johnson, John Levon & Thanos,

    Any thoughts?

Thank you!
--
Jag

> 
> Stefan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]