qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/5] vfio/pci: Fix up breakage against split irqchip and INTx


From: Peter Xu
Subject: Re: [PATCH 0/5] vfio/pci: Fix up breakage against split irqchip and INTx
Date: Fri, 28 Feb 2020 10:25:11 -0500

On Fri, Feb 28, 2020 at 11:36:55AM +0100, Paolo Bonzini wrote:
> On 26/02/20 23:50, Peter Xu wrote:
> > VFIO INTx is not working with split irqchip.  On new kernels KVM_IRQFD
> > will directly fail with resamplefd attached so QEMU will automatically
> > fallback to the INTx slow path.  However on old kernels it's still
> > broken.
> > 
> > Only until recently I noticed that this could also break PXE boot for
> > assigned NICs [1].  My wild guess is that the PXE ROM will be mostly
> > using INTx as well, which means we can't bypass that even if we
> > enables MSI for the guest kernel.
> > 
> > This series tries to first fix this issue function-wise, then speed up
> > for the INTx again with resamplefd (mostly following the ideas
> > proposed by Paolo one year ago [2]).  My TCP_RR test shows that:
> > 
> >   - Before this series: this is broken, no number to show
> > 
> >   - After patch 1 (enable slow path): get 63% perf comparing to full
> >     kernel irqchip
> 
> Oh, I thought something like patch 1 had already been applied.
> 
> One comment: because you're bypassing IOAPIC when raising the irq, the
> IOAPIC's remote_irr for example will not be set.  Most OSes probably
> don't care, but it's at least worth a comment.

Ouch I should definitely do that...  How about something like this
(in ioapic_eoi_broadcast(), I even changed kvm_resample_fd_notify to
return a boolean to show whether some GSI is kicked so for this case
we don't need to proceed on checking irr and remote irr):

            /*
             * When IOAPIC is in the userspace while APIC is still in
             * the kernel (i.e., split irqchip), we have a trick to
             * kick the resamplefd logic for registered irqfds from
             * userspace to deactivate the IRQ.  When that happens, it
             * means the irq bypassed userspace IOAPIC (so the irr and
             * remote-irr of the table entry should be bypassed too
             * even if interrupt come), then we don't need to clear
             * the remote-IRR and check irr again because they'll
             * always be zeros.
             */
            if (kvm_resample_fd_notify(n)) {
                continue;
            }

I confess this is still tricky, and actually after some careful read I
noticed you've proposed a similar kernel fix for the problem too which
I overlooked (https://patchwork.kernel.org/patch/10738541/#22609933).
My current thought is that we keep this hackery in userspace only so
we keep split+resamplefd forbidden in the kernel and be clean there.

What's your opinion?

(I should have marked this series as RFC when post)

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]