qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Notify IRQ sources of level interrupt ack/EOI


From: David Woodhouse
Subject: Re: [RFC] Notify IRQ sources of level interrupt ack/EOI
Date: Thu, 12 Jan 2023 08:05:37 +0000
User-agent: Evolution 3.44.4-0ubuntu1

On Wed, 2023-01-11 at 12:43 -0700, Alex Williamson wrote:
> On Wed, 11 Jan 2023 19:08:44 +0000
> David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > On Wed, 2023-01-11 at 11:29 -0700, Alex Williamson wrote:
> > > 
> > > Nice.  IIRC, we ended up with the hack solution we have today in vfio
> > > because there was too much resistance to callbacks that were only
> > > necessary for vfio in the past.  Once we had KVM resampling support,
> > > it simply wasn't worth the effort for a higher latency solution to
> > > fight that battle, so we implemented what could best be described as
> > > a universal workaround embedded in vfio.
> > > 
> > > Clearly a callback is preferable, and yes, that's how we operate with
> > > KVM resampling and unmasking INTx, so in theory plumbing this to our
> > > existing eoi callback and removing the region toggling ought to do
> > > the right thing.  Thanks,  
> > 
> > Well, I'm happy for the Xen support be a second use case which means
> > it's no longer "only necessary for VFIO", and keep prodding at it if
> > that's going to be useful...
> 
> Welcome aboard.  I take it from your cover letter than non-x86
> architectures would be on your todo list.  Ideally the ack callback
> would simply be a requirement for any implementation of a new interrupt
> controller, but that's where we get into striking a balance of device
> assignment imposing requirements on arbitrary architectures that may or
> may not care, or even support, device assignment.
> 
> This is the... dare I say, elegance of the region access hack.  It's
> obviously not pretty or performant, but it universally provides an
> approximation of the behavior of an emulated device, ie. some form of
> guest access to the device is required to de-assert the interrupt.
> 
> We probably need some way to detect the interrupt controller support
> for the callback mechanism so we can generate an appropriate user
> warning to encourage development of that support and fall back to our
> current hack for some degree of functionality.  Thanks,

The other thing I'd like to do is figure out the semantics of the
callback. Right now I've gone for "return true to clear the level" and
that makes me cringe a little bit because in the VFIO case, the
callback will re-enable the IRQ in the kernel *before* returning true
and the irqchip actually clearing its s->irr.

Which could *theoretically* race with the next interrupt happening and
setting the IRR before it's cleared. Although I don't think that
actually happens in practice in qemu because the eventfd would be
processed in the same thread? But I'm not sure I like it; it feels
wrong.

One option is for the generic processing to go "if there *is* a
callback, zero the IRR first and then call the callback". Which fixes
the above race, but you do rely on the drivers to *actually* reassert
it. Which might be OK because it only has to propagate up one level up
the IRQ link chain at a time — e.g. to the PCI INTx code, which would
then process the given INT[ABCD] line and call back those drivers which
*have* a callback to resample *their* state, then recalculate the
overall level based on the corresponding irq_count.

I also wasn't sure if I want to allow calling qemu_set_irq() from the
callback itself, instead of having it return a boolean. That might let
the *callback* worry about when to clear/set the level. It doesn't work
well with shared interrupts in some cases, because the race still
exists if callback B zeroes the IRQ level after callback A already did
so *and* reasserted it. But I think shared interrupts are hosed anyway
if e.g. HPET links to the same GSI that a PCI INTx is routed to; they
overwrite each other's state instead of it being a logical OR of the
two? To share interrupts we need *explicit* muxing like the PCI code
has with its irq_count handling.

This last option has the advantage that it maps directly to the
existing VFIO EOI callback, e.g. vfio_intx_eoi(). It's probably where
I'll start.

Attachment: smime.p7s
Description: S/MIME cryptographic signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]