[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v6 15/19] vfio-user: handle device interrupts
From: |
Jag Raman |
Subject: |
Re: [PATCH v6 15/19] vfio-user: handle device interrupts |
Date: |
Tue, 29 Mar 2022 19:06:46 +0000 |
> On Mar 29, 2022, at 10:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Sat, Mar 26, 2022 at 11:47:36PM +0000, Jag Raman wrote:
>>
>>
>>> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>
>>> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
>>>> Forward remote device's interrupts to the guest
>>>>
>>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>>> ---
>>>> include/hw/pci/pci.h | 6 ++
>>>> include/hw/remote/vfio-user-obj.h | 6 ++
>>>> hw/pci/msi.c | 13 +++-
>>>> hw/pci/msix.c | 12 +++-
>>>> hw/remote/machine.c | 11 +--
>>>> hw/remote/vfio-user-obj.c | 107 ++++++++++++++++++++++++++++++
>>>> stubs/vfio-user-obj.c | 6 ++
>>>> MAINTAINERS | 1 +
>>>> hw/remote/trace-events | 1 +
>>>> stubs/meson.build | 1 +
>>>> 10 files changed, 158 insertions(+), 6 deletions(-)
>>>> create mode 100644 include/hw/remote/vfio-user-obj.h
>>>> create mode 100644 stubs/vfio-user-obj.c
>>>>
>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>>>> index c3f3c90473..d42d526a48 100644
>>>> --- a/include/hw/pci/pci.h
>>>> +++ b/include/hw/pci/pci.h
>>>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
>>>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
>>>> pcibus_t addr, pcibus_t size, int type);
>>>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
>>>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
>>>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
>>>>
>>>> typedef struct PCIIORegion {
>>>> pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
>>>> @@ -323,6 +325,10 @@ struct PCIDevice {
>>>> /* Space to store MSIX table & pending bit array */
>>>> uint8_t *msix_table;
>>>> uint8_t *msix_pba;
>>>> +
>>>> + PCIMSINotify *msi_notify;
>>>> + PCIMSIxNotify *msix_notify;
>>>> +
>>>> /* MemoryRegion container for msix exclusive BAR setup */
>>>> MemoryRegion msix_exclusive_bar;
>>>> /* Memory Regions for MSIX table and pending bit entries. */
>>>> diff --git a/include/hw/remote/vfio-user-obj.h
>>>> b/include/hw/remote/vfio-user-obj.h
>>>> new file mode 100644
>>>> index 0000000000..87ab78b875
>>>> --- /dev/null
>>>> +++ b/include/hw/remote/vfio-user-obj.h
>>>> @@ -0,0 +1,6 @@
>>>> +#ifndef VFIO_USER_OBJ_H
>>>> +#define VFIO_USER_OBJ_H
>>>> +
>>>> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
>>>> +
>>>> +#endif
>>>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
>>>> index 47d2b0f33c..93f5e400cc 100644
>>>> --- a/hw/pci/msi.c
>>>> +++ b/hw/pci/msi.c
>>>> @@ -51,6 +51,8 @@
>>>> */
>>>> bool msi_nonbroken;
>>>>
>>>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
>>>> +
>>>> /* If we get rid of cap allocator, we won't need this. */
>>>> static inline uint8_t msi_cap_sizeof(uint16_t flags)
>>>> {
>>>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>>> dev->msi_cap = config_offset;
>>>> dev->cap_present |= QEMU_PCI_CAP_MSI;
>>>>
>>>> + dev->msi_notify = pci_msi_notify;
>>>
>>> Are you sure it's correct to skip the msi_is_masked() logic? I think the
>>> callback function should only override the behavior of
>>> msi_send_message(), not the entire msi_notify() function.
>>>
>>> The same applies to MSI-X.
>>
>> Hi Stefan,
>>
>> We noticed that the client is handling the masking and unmasking of MSIx
>> interrupts.
>>
>> Concerning MSIx, vfio_msix_vector_use() handles unmasking and
>> vfio_msix_vector_release() handles masking operations. The server triggers
>> an MSIx interrupt by signaling the eventfd associated with the vector. If
>> the vector
>> is unmasked, the interrupt bypasses the client/QEMU and takes this
>> path: “server -> KVM -> guest”. Whereas, if the vector is masked, it lands
>> on the
>> client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt()
>> suppresses the
>> interrupt if the vector is masked. The use and release functions switch the
>> server’s
>> eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and
>> VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the
>> VFIO_DEVICE_SET_IRQS message.
>>
>> Concerning MSI, the server should check if the vector is unmasked before
>> triggering. The server is not doing it presently, will update it. For some
>> reason,
>> I had assumed that MSI handling is similar to MSIx in terms of masking -
>> sorry
>> about that. The masking and unmasking information for MSI is in the config
>> space
>> registers, so the server should have this information.
>>
>> You had previously suggested using callbacks for msi_get_message &
>> msi_send_message, considering the masking issue. Given MSIx masking
>> (including the MSIx table BAR) is handled at the client, the masking
>> information
>> doesn’t reach the server - so msix_notify will never invoke the
>> msi_send_message callback - all the vectors remain masked at the server
>> end (msix_init() -> msix_mask_all()).
>
> I was expecting vfio-user devices to be involved in MSI-X masking so
> they can implement the Pending bit semantics described in the spec:
>
> If a masked vector has its Pending bit Set, and the associated
> underlying interrupt events are somehow satisfied (usually by software
> though the exact manner is Function-specific), the Function must Clear
> the Pending bit, to avoid sending a spurious interrupt message later
> when software unmasks the vector.
>
> Does QEMU VFIO support this?
QEMU VFIO doesn’t seem to support it - I couldn’t find a place where
an assigned/passthru PCI device clears the pending bits in QEMU.
>
> What happens when a hw/net/e1000e_core.c vfio-user device uses
> msix_clr_pending() and related APIs?
>
> Also, having the vfio-user daemon write to the eventfd while the vector
> is masked is a waste of CPU cycles. The PCIe spec describes using MSI-X
> masking for poll mode operation and going via eventfd is suboptimal:
>
> Software is permitted to mask one or more vectors indefinitely, and
> service their associated interrupt events strictly based on polling
> their Pending bits. A Function must Set and Clear its Pending bits as
> necessary to support this “pure polling” mode of operation.
>
> Maybe the answer is these issues don't matter in practice because MSI-X
> masking is not used much?
From what I can tell, “pure polling” is used by ivshmem and virtio-pci devices
in QEMU.
e1000e doesn’t use “pure polling”, but it does clear pending interrupts.
John Johnson, John Levon & Thanos,
Any thoughts?
Thank you!
--
Jag
>
> Stefan