[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XIVE VFIO kernel resample failure in INTx mode under heavy load

From: Cédric Le Goater
Subject: Re: XIVE VFIO kernel resample failure in INTx mode under heavy load
Date: Tue, 19 Apr 2022 09:35:43 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0

Tested on POWER9 with a passed through XHCI host and "-append pci=nomsi" and 
"-machine pseries,ic-mode=xics,kernel_irqchip=on" (and s/xics/xive/).

ok. This is deactivating the default XIVE (P9+) mode at the platform level,
hence forcing the XICS (P8) mode in a POWER9 guest running on a POWER9 host.
It is also deactivating MSI, forcing INTx usage in the kernel and forcing
the use of the KVM irqchip device to make sure we are not emulating in QEMU.

We are far from the default scenario but this is it !

well, "-machine pseries,ic-mode=xive,kernel_irqchip=on" is the default.

The default is a 'dual' ic-mode, so XICS+XIVE are announced by CAS.
kernel_irqchip is not strictly enforced, so QEMU could fallback to
an emulated irqchip if needed.

"pci=nomsi" is not but since that actual device is only capable on INTx,
the default settings expose the problem.

When it is XIVE-on-XIVE (host and guest are XIVE),

We call this mode : XIVE native, or exploitation, mode. Anyhow, it is always
XIVE under the hood on a POWER9/POWER10 box.

INTx is emulated in the QEMU's H_INT_ESB handler

LSI are indeed all handled at the QEMU level with the H_INT_ESB hcall.
If I remember well, this is because we wanted a simple way to synthesize
the interrupt trigger upon EOI when the level is still asserted. Doing
this way is compatible for both kernel_irqchip=off/on modes because the
level is maintained in QEMU.

This is different for the other two XICS KVM devices which maintain the
assertion level in KVM.

and IRQFD_RESAMPLE is just useless in such case (as it is designed to eliminate going to the 
userspace for the EOI->INTx unmasking) and there is no pathway to call the eventfd's 
irqfd_resampler_ack() from QEMU. So the VM's XHCI device receives exactly 1 interrupt and 
that is it. "kernel_irqchip=off" fixes it (obviously).


When it is XICS-on-XIVE (host is XIVE and guest is XICS),

yes (FYI, we have similar glue in skiboot ...)

then the VM receives 100000 interrupts and then it gets frozen 
(__report_bad_irq() is called). Which happens because (unlike XICS-on-XICS), 
the host XIVE's xive_(rm|vm)_h_eoi() does not call irqfd_resampler_ack(). This 
fixes it:

diff --git a/arch/powerpc/kvm/book3s_xive_template.c 
index b0015e05d99a..9f0d8e5c7f4b 100644
--- a/arch/powerpc/kvm/book3s_xive_template.c
+++ b/arch/powerpc/kvm/book3s_xive_template.c
@@ -595,6 +595,8 @@ X_STATIC int GLUE(X_PFX,h_eoi)(struct kvm_vcpu *vcpu, 
unsigned long xirr)
         xc->hw_cppr = xc->cppr;
         __x_writeb(xc->cppr, __x_tima + TM_QW1_OS + TM_CPPR);

+       kvm_notify_acked_irq(vcpu->kvm, 0, irq);
         return rc;

OK. XICS-on-XIVE is also broken then :/ what about XIVE-on-XIVE ?

Not sure I am following (or you are) :) INTx is broken on P9 in either mode. 
MSI works in both.

Sorry my question was not clear. the above fixed XICS-on-XIVE but
not XIVE-on-XIVE and I was asking about that. disabling resample
seems to be the solution for all.

The host's XICS does call kvm_notify_acked_irq() (I did not test that but the 
code seems to be doing so).

After re-reading what I just wrote, I am leaning towards disabling use of 
KVM_CAP_IRQFD_RESAMPLE as it seems last worked on POWER8 and never since :)

and it would fix XIVE-on-XIVE.

Are you saying that passthru on POWER8 is broken ? fully or only INTx ?

No, the opposite - P8 works fine, kvm_notify_acked_irq() is there.

Did I miss something in the picture (hey Cedric)?

You seem to have all combination in mind: host OS, KVM, QEMU, guest OS

For the record, here is a documentation we did:


It might need some updates.

When I read this, a quite from the Simpsons pops up in my mind: “Dear Mr. 
President there are too many states nowadays. Please eliminate three. I am NOT 
a crackpot.” :)

Yes. It blew my mind for sometime ... :)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]