[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE
Cédric Le Goater
Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Wed, 17 Jan 2018 15:39:46 +0100
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2
On 01/17/2018 12:10 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2018-01-17 at 10:18 +0100, Cédric Le Goater wrote:
>>>>> Also, have we decided how the process of switching between XICS and
>>>>> XIVE will work vs. CAS ?
>>>> That's how it is described in the architecture. The current choice is
>>>> to create both XICS and XIVE objects and choose at CAS which one to
>>>> use. It relies today on the capability of the pseries machine to
>>>> allocate IRQ numbers for both interrupt controller backends. These
>>>> patches have been merged in QEMU.
>>>> A change of interrupt mode results in a reset. The device tree is
>>>> populated accordingly and the ICPs are switched for the model in
>>> For KVM we need to only instanciate one of them though.
>> How would we handle a guest rebooting on a kernel without XIVE support ?
> It will do CAS again and we can change the devices.
So, we would destroy the previous QEMU ICS object and create a new one
in the CAS hcall. That would probably work. There might be some issues
in creating and destroying the ICS KVM device, but that can be studied
It used to be considered ugly to create a QEMU device at reset time, so
I wonder if this is still the case, because when the machine reaches CAS,
we really are beyond reset.
If this is OK, then the next "issue" is to keep in sync the allocated
IRQ numbers. The IRQ allocator is now merged at the machine level, so
the synchronization is obvious to do when both backend QEMU objects
are available. that's the path I took. If both QEMU objects are not
available, then we need to scan the IRQ number space in the current
interrupt mode and allocate the same IRQs in the newly negotiated mode.
Probably OK. I don't see major problems with the current code.
Migration is a problem. We will need both backend QEMU objects to be
available anyhow if we want to migrate. So we are back to the current
solution creating both QEMU objects but we can try to defer some of the
KVM inits and create the KVM device on demand at CAS time.
The next problem is the ICP object that currently needs the KVM device
fd to connect the vcpus ... So, we will need to change that also.
That is probably the biggest problem today. We need a way to disconnect
the vpcu from the KVM device and see how we can defer the connection.
I need to make sure this is possible, I can check that without XIVE
>> Are you suggesting to create the XICS or XIVE device in the CAS negotiation
>> process ? So, the machine would not have any interrupt controller before
>> CAS. That seems really late to me. grub uses the console for instance.
> We start with XICS by default.
>> I think it should prepare for both options, start in XIVE legacy mode,
>> which is XICS, then possibly switch to XIVE exploitation mode.
>>>>> And how that will interact with KVM ?
>>>> I expect we will do the same, which is to create two KVM devices to
>>>> be able to handle both interrupt controller backends depending on the
>>>> mode negotiated by the guest.
>>> That will be an ungodly mess, I'd rather we only instanciate the right
>> It's rather transparent currently in the emulated version. There are two
>> sets of objects in QEMU, switching is done in CAS. KVM support should not
>> change anything in that area.
>> I expect the 'xive-kvm' object to get/set states for migration, just like
>> for XICS and to setup the ESB+TIMA memory regions, which is new.
> But both XICS and XIVE are completely different kernel KVM devices that will
> need to "hook" into the same set of internal hooks for things like interrupts
> being passed through, RTAS calls etc...
> How does KVM knows which one to "activate" ?
Can't we add an extra IRQ type and use vcpu->arch.irq_type for that ?
I haven't studied all the low level details though.
> I don't think the kernel should have both.
I hear that. From a QEMU perspective, it is much easier to put everything
in place for both interrupt modes and let the guest decide what it wants
If we choose not to, we will need to find solution to defer the KVM inits
and to disconnect/reconnect the vcpus. For the latter, we could add a
KVM_DISABLE_CAP ioctl or maybe better add a new capability like
KVM_CAP_IRQ_XIVE to perform the switch.