qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Qemu interrupt-remap fault support


From: David Woodhouse
Subject: Re: Qemu interrupt-remap fault support
Date: Fri, 13 Jan 2023 17:08:19 +0000
User-agent: Evolution 3.44.4-0ubuntu1

On Fri, 2023-01-13 at 09:51 -0700, Alex Williamson wrote:
> On Fri, 13 Jan 2023 09:08:38 +0000
> David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > I'm looking at interrupt remapping (because I need to hook into the
> > translation somehow to add PIRQ support for Xen which translates guest
> > MSIs directly to KVM_IRQ_ROUTING_XEN_EVTCHN).
> > 
> > Am I right in understanding that it doesn't report faults on interrupts
> > which can't be translated? It attempts to translate interrupts at the
> > time the table is modified (vtd_int_remap()) or when an APIC access
> > actually triggers an MSI (vtd_mem_ir_write()) but in neither case does
> > it actually raise a fault?
> 
> AIUI, yes.
>  
> > The behaviour we want here is that we only raise a fault when the IRQ
> > actually *happens*. But that's hard in our current model where it looks
> > like we pretranslate *everything* in advance and just let it run.
> > 
> > Here's a proposal for a model which could make it work (using VFIO as
> > the example since that's the more complex part but it works for
> > emulated MSI sources too):
> > 
> > We consume the VFIO eventfd *both* in userspace and the kernel. (Since 
> > https://lore.kernel.org/kvm/20201027143944.648769-1-dwmw2@infradead.org/
> > we can just keep listening on the VFIO eventfd in userspace and the
> > kernel will eat all the events so you never notice. On older kernels we
> > have to manually stop listening in userspace.)
> 
> How do we determine if we're on a new/old kernel for this?

Well, you could always just *assume* an old kernel (or just base it on
some KVM CAP that's known to be newer, maybe). There's no harm in
always detaching the userspace listener when you enable the kernel
delivery.

> > When a translation is valid and should be considered 'cached' in the
> > IOMMU, that's when we actually hook it up to the irqfd. 
> > 
> > We can ditch the iec invalidate callbacks (vtd_iec_notify_all) because
> > all an invalidation needs to do is KVM_IRQFD_FLAG_DEASSIGN for the
> > corresponding GSI.
> > 
> > (
> > You might consider abusing a spare field in the KVM routing table to
> > hold a cookie like the IRTE# so that you know *which* entries to
> > invalidate. I couldn't possibly comment.
> > 
> >         /* 64-bit cookie for IOMMU to use for invalidation choices */
> >         #define ire_ir_cookie(ire) ((ire)->u.adapter.ind_offset)
> > 
> >         /* Flags, to indicate a stale entry that needs retranslating
> > */
> >         #define ire_user_flags(ire) ((ire)->u.adapter.summary_offset)
> >         #define IRE_USER_FLAG_STALE             1
> > )
> > 
> > So when an interrupt happens and it's *untranslated*, that's when it
> > gets raised to userspace to handle, e.g. in vfio_msi_interrupt(). That
> > does the normal thing and attempts to deliver the guest MSI directly.
> > We add a flag "bool delivering_now" to the X86IOMMUClass int_remap
> > function, to allow it to distinguish between preemptive translations
> > and actual delivery and to raise the fault in the latter case.
> > 
> > When the guest frobs a device's MSI table we can do the translation as
> > we do at the moment, of course with the 'delivering_now' argument being
> > false. And *if* the translation succeeds then we can install the IRQFD
> > right away.
> > 
> > This model allows us to generate faults as the hardware would, and also
> > improves the efficiency of invalidation by only invalidating what we
> > need to. I haven't looked hard at how it works with an emulated AMD
> > IOMMU, but I know that the Xen PIRQ support (which is where I came in)
> > slots into it fairly trivially, using the PIRQ# as the 'cookie' for
> > invalidation instead of the IRTE# that the Intel IOMMU uses.
> 
> Seems ok, but it hinges on being able to detect whether kvm consumes
> the eventfd or not since and un-consumed eventfd would then look the
> same as an invalid translation.  Thanks,

There's a difference; we do *know* that the translation is invalid.
When we invalidate it and detach the IRQFD, we can also set a STALE bit
in the routing table (or in our own data structure if we prefer).

If we ever get the event in userspace when the STALE bit *isn't* set,
then we know we're on an older kernel and we can do nothing (except to
set a global flag to remember such, and then stop listening until this
IRQ is invalidated again — at which time the 'old kernel' flag will be
set and we can stop userspace from listening immediately.



Attachment: smime.p7s
Description: S/MIME cryptographic signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]