[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH v2 1/3] q35: set split kernel irqchip as default |
Date: |
Sat, 27 Apr 2019 10:09:51 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 |
On 27/04/19 07:29, Paolo Bonzini wrote:
>
>>> In my testing it looks like KVM advertises supporting the KVM_IRQFD
>>> resample feature, but vfio never gets the unmask notification, so the
>>> device remains with DisINTx set and no further interrupts are
>>> generated. Do we expect KVM's IRQFD with resampler to work in the
>>> split IRQ mode? We can certainly hope that "high performance" devices
>>> use MSI or MSI/X, but this would be quite a performance regression with
>>> split mode if our userspace bypass for INTx goes away. Thanks,
>>
>> arch/x86/kvm/lapic.c:kvm_ioapic_send_eoi() dumps to userspace before
>> kvm_ioapic_update_eoi() can handle the irq_ack_notifier_list via
>> kvm_notify_acked_gsi(),
>
> That wouldn't help because kvm_ioapic_update_eoi would not even be
> able to access vcpu->kvm->arch.vioapic (it's NULL).
>
> The following untested patch would signal the resamplefd in
> kvm_ioapic_send_eoi,
> before requesting the exit to userspace. However I am not sure how QEMU
> sets up the VFIO eventfds: if I understand correctly, when VFIO writes again
> to
> the irq eventfd, the interrupt request would not reach the userspace IOAPIC,
> but
> only the in-kernel LAPIC. That would be incorrect, and if my understanding is
> correct we need to trigger resampling from hw/intc/ioapic.c.
Actually it's worse: because you're bypassing IOAPIC when raising the
irq, the IOAPIC's remote_irr for example will not be set. So split
irqchip currently must disable the intx fast path completely.
I guess we could also reimplement irqfd and resamplefd in the userspace
IOAPIC, and run the listener in a separate thread (using "-object
iothread" on the command line and AioContext in the code).
Paolo
>
> Something like:
>
> 1) VFIO uses kvm_irqchip_add_irqfd_notifier_gsi instead of KVM_IRQFD directly
>
> 2) add a NotifierList in kvm_irqchip_assign_irqfd
>
> 3) if kvm_irqchip_is_split(), register a corresponding Notifier in ioapic.c
> which
> stores the resamplefd as an EventNotifier*
>
> 4) ioapic_eoi_broadcast writes to the resamplefd
>
> BTW, how do non-x86 platforms handle intx resampling?
>
> Paolo
>
> ---- 8< -----
> From: Paolo Bonzini <address@hidden>
> Subject: [PATCH WIP] kvm: lapic: run ack notifiers for userspace IOAPIC routes
>
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index ea1a4e0297da..be337e06e3fd 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -135,4 +135,6 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
> ulong *ioapic_handled_vectors);
> void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
> ulong *ioapic_handled_vectors);
> +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector);
> +
> #endif
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 3cc3b2d130a0..46ea79a0dd8a 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -435,6 +435,34 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
> srcu_read_unlock(&kvm->irq_srcu, idx);
> }
>
> +void kvm_notify_userspace_ioapic_routes(struct kvm *kvm, int vector)
> +{
> + struct kvm_kernel_irq_routing_entry *entry;
> + struct kvm_irq_routing_table *table;
> + u32 i, nr_ioapic_pins;
> + int idx;
> +
> + idx = srcu_read_lock(&kvm->irq_srcu);
> + table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> + nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
> + kvm->arch.nr_reserved_ioapic_pins);
> +
> + for (i = 0; i < nr_ioapic_pins; i++) {
> + hlist_for_each_entry(entry, &table->map[i], link) {
> + struct kvm_lapic_irq irq;
> +
> + if (entry->type != KVM_IRQ_ROUTING_MSI)
> + continue;
> +
> + kvm_set_msi_irq(kvm, entry, &irq);
> + if (irq.level && irq.vector == vector)
> + kvm_notify_acked_gsi(kvm, i);
> + }
> + }
> +
> + srcu_read_unlock(&kvm->irq_srcu, idx);
> +}
> +
> void kvm_arch_irq_routing_update(struct kvm *kvm)
> {
> kvm_hv_irq_routing_update(kvm);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9f089e2e09d0..8f8e76d77925 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1142,6 +1142,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic
> *apic, int vector)
>
> static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
> {
> + struct kvm_vcpu *vcpu = apic->vcpu;
> int trigger_mode;
>
> /* Eoi the ioapic only if the ioapic doesn't own the vector. */
> @@ -1149,9 +1150,10 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic
> *apic, int vector)
> return;
>
> /* Request a KVM exit to inform the userspace IOAPIC. */
> - if (irqchip_split(apic->vcpu->kvm)) {
> - apic->vcpu->arch.pending_ioapic_eoi = vector;
> - kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
> + if (irqchip_split(vcpu->kvm)) {
> + kvm_notify_userspace_ioapic_routes(vcpu->kvm, vector);
> + vcpu->arch.pending_ioapic_eoi = vector;
> + kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu);
> return;
> }
>
> @@ -1160,7 +1162,7 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic,
> int vector)
> else
> trigger_mode = IOAPIC_EDGE_TRIG;
>
> - kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
> + kvm_ioapic_update_eoi(vcpu, vector, trigger_mode);
> }
>
> static int apic_set_eoi(struct kvm_lapic *apic)
>
>
>> so it looks like KVM really ought to return an
>> error when trying to register a resample IRQFD when irqchip_split(),
>> but do we have better options? EOI handling in QEMU is pretty much
>> non-existent and even if it was in place, it's a big performance
>> regression for vfio INTx handling. Is a split irqchip that retains
>> performant resampling (EOI) support possible? Thanks,