qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/3] VFIO: Disable INTx interrupt


From: Gavin Shan
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset
Date: Fri, 20 Mar 2015 15:01:43 +1100
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Mar 18, 2015 at 03:54:09PM +1100, Gavin Shan wrote:
>On Tue, Mar 17, 2015 at 03:16:46PM -0600, Alex Williamson wrote:
>>On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
>>> When Linux guest recovers from EEH error on the following Emulex
>>> adapter, the MSIx interrupts are disabled and the INTx emulation
>>> is enabled. One INTx interrupt is injected to the guest by host
>>> because of detected pending INTx interrupts on the adapter. QEMU
>>> disables mmap'ed BAR regions and starts a timer to enable those
>>> regions at later point the INTx interrupt handler. Unfortunately,
>>> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
>>> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
>>> recovery failure at guest side because of hanged MMIO access.
>>> 
>>>  # lspci | grep Emulex
>>>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>>>               OneConnect 10Gb NIC (be3) (rev 02)
>>>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>>>               OneConnect 10Gb NIC (be3) (rev 02)
>>> 
>>> The patch disables INTx interrupt before doing EEH reset to avoid
>>> the issue.
>>> 
>>> Signed-off-by: Gavin Shan <address@hidden>
>>> ---
>>>  hw/vfio/pci.c | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>> 
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index fca1edc..bfa3d0c 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3340,6 +3340,14 @@ int vfio_container_eeh_event(AddressSpace *as, 
>>> int32_t groupid,
>>>           * disable it so that it can be reenabled properly. Also,
>>>           * the cached MSIx table should be cleared as it's not
>>>           * reflecting the contents in hardware.
>>> +         *
>>> +         * We might have INTx interrupt whose handler disables the
>>> +         * memory mapped BARs. The INTx pending state can't be
>>> +         * cleared with memory BAR access in slow path. The timer
>>> +         * kicked by the INTx interrupt handler won't enable those
>>> +         * disabled memory mapped BARs, which leads to hanged MMIO
>>> +         * register access and EEH recovery failure. We simply disable
>>> +         * INTx if it has been enabled.
>>>           */
>>
>>This feels like a quick hack for a problem we don't really understand.
>>Why is INTx being fired through QEMU rather than KVM?  Why isn't the
>>INTx re-enabling happening since this is exactly the scenario where it's
>>supposed to work (ie. INTx occurs, BAR mmap disabled, guest accesses
>>BAR, mmap re-enabled, INTx unmasked)?
>>
>
>Indeed. It's a quick hack before finding the root cause about why slow
>path doesn't work when fast path is disabled. I'm still tracing it and
>hopefully I can find something soon. Note that: KVM IRQFD isn't enabled
>on the system I was doing experiments.
>

I'm not sure if there're further replies to this thread since then. My
mailbox is broken for quite a while. Also, I don't know how Ben is missed
from the cc list. Add him who might have more thoughts. Sorry for mail 
flooding to you, Ben :-)

The root cause of the problem would be specific to how MMIO regions are
supported on KVM for book3s_hv, where invalid page table entries (PTEs)
are built to trace the address mapping between guest virtual address
and host physical address. Since they're marked as "invalid", accessing
to the MMIO region will cause page fault in host kernel and then the
PTE table is searched to locate the address mapping, the access is
propogated to QEMU for emulation if the address mapping exits in PTE
table. It's how the "slow path" works.

For the "fast path", KVM creates memory slot and update PTE table with
the mappings upon page fault. Those PTEs are "valid" and seen by CPU
to do address translation automatically. When the memory region for
the "fast path" is disabled from QEMU, the host will purge TLB/PTEs
for the mappings. So when the "fast path" regions are disabled, the
PTE table doesn't have the entries required for "slow path" access
any more. At later point, guest tries access to the region and the
host just doesn't know how to handle the case.

I already have couple of patches to introduce addtional flag to
identify the "fast path" region, which has properties that RAM
region and MMIO region have. Once the region is disabled, host
purges its mappings in PTEs/TLB, and then update PTEs with the
mappings required by "slow path" region. With that, I didn't see
the problem. Both host/QEMU needs changes. I'll remove this patch
from the series and send out a new revision. The patches to address
the "fast path" issue will be sent separately after checking with
Ben or Paul they're worthy to be sent out.

Thanks,
Gavin

>
>>>          QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>>              vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>> @@ -3348,6 +3356,11 @@ int vfio_container_eeh_event(AddressSpace *as, 
>>> int32_t groupid,
>>>              }
>>>  
>>>              msix_reset(&vdev->pdev);
>>> +
>>> +            /* Disable INTx */
>>> +            if (vdev->interrupt == VFIO_INT_INTx) {
>>> +                vfio_disable_intx(vdev);
>>> +            }
>>>          }
>>>  
>>>          break;
>>
>>
>>
>>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]