[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE

From: Cédric Le Goater
Subject: Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Date: Thu, 12 Apr 2018 10:51:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 04/12/2018 07:15 AM, David Gibson wrote:
> On Wed, Jan 17, 2018 at 03:39:46PM +0100, Cédric Le Goater wrote:
>> On 01/17/2018 12:10 PM, Benjamin Herrenschmidt wrote:
>>> On Wed, 2018-01-17 at 10:18 +0100, Cédric Le Goater wrote:
>>>>>>> Also, have we decided how the process of switching between XICS and
>>>>>>> XIVE will work vs. CAS ? 
>>>>>> That's how it is described in the architecture. The current choice is
>>>>>> to create both XICS and XIVE objects and choose at CAS which one to
>>>>>> use. It relies today on the capability of the pseries machine to 
>>>>>> allocate IRQ numbers for both interrupt controller backends. These
>>>>>> patches have been merged in QEMU.
>>>>>> A change of interrupt mode results in a reset. The device tree is 
>>>>>> populated accordingly and the ICPs are switched for the model in 
>>>>>> use. 
>>>>> For KVM we need to only instanciate one of them though.
>>>> Hmm,
>>>> How would we handle a guest rebooting on a kernel without XIVE support ? 
>>> It will do CAS again and we can change the devices.
>> So, we would destroy the previous QEMU ICS object and create a new one 
>> in the CAS hcall. That would probably work. There might be some issues 
>> in creating and destroying the ICS KVM device, but that can be studied 
>> without XIVE.
> Adding and removing devices at runtime based on guest requests like
> this will get really hairy in qemu.

I confirm ...

> As I've said before for the first cut, I think we want to select just
> one as a machine option to avoid this confusion.


> Looking further ahead, I think we'll be better off having both the
> XIVE and XICS models always present (at least minimally) in qemu, but
> with only one "active" at any given time.

Under emulation it is not too complex to support both mode. 
XIVE and XICS objects are both created but spapr->ov5_cas 
filters their usage 

However, syncing the change in KVM is more complex.

> Note that having the inactive one destroy and clean up the
> corresponding KVM devices is fine, as is deallocating as much of its
> runtime state as we can without changing the notional QOM tree.

yes. I will try to send a patchset organized that way : 

 - spapr XIVE emulated mode (both mode supported)
 - XIVE KVM in an exclusive way, the machine will need to be
   restarted from the command line to change interrupt mode.   
 - support of change of interrupt mode under KVM 
 - powernv device model (rough)


>> It used to be considered ugly to create a QEMU device at reset time, so 
>> I wonder if this is still the case, because when the machine reaches CAS, 
>> we really are beyond reset.   
>> If this is OK, then the next "issue" is to keep in sync the allocated 
>> IRQ numbers. The IRQ allocator is now merged at the machine level, so 
>> the synchronization is obvious to do when both backend QEMU objects 
>> are available. that's the path I took. If both QEMU objects are not 
>> available, then we need to scan the IRQ number space in the current 
>> interrupt mode and allocate the same IRQs in the newly negotiated mode. 
>> Probably OK. I don't see major problems with the current code. 
>> Migration is a problem. We will need both backend QEMU objects to be 
>> available anyhow if we want to migrate. So we are back to the current 
>> solution creating both QEMU objects but we can try to defer some of the 
>> KVM inits and create the KVM device on demand at CAS time.
>> The next problem is the ICP object that currently needs the KVM device 
>> fd to connect the vcpus ... So, we will need to change that also. 
>> That is probably the biggest problem today. We need a way to disconnect 
>> the vpcu from the KVM device and see how we can defer the connection.
>> I need to make sure this is possible, I can check that without XIVE
>> I think.
>>>> Are you suggesting to create the XICS or XIVE device in the CAS 
>>>> negotiation 
>>>> process ? So, the machine would not have any interrupt controller before 
>>>> CAS. That seems really late to me. grub uses the console for instance. 
>>> We start with XICS by default.
>> yes.
>>>> I think it should prepare for both options, start in XIVE legacy mode, 
>>>> which is XICS, then possibly switch to XIVE exploitation mode.
>>>>>>> And how that will interact with KVM ? 
>>>>>> I expect we will do the same, which is to create two KVM devices to 
>>>>>> be able to handle both interrupt controller backends depending on the 
>>>>>> mode negotiated by the guest.  
>>>>> That will be an ungodly mess, I'd rather we only instanciate the right
>>>>> one.
>>>> It's rather transparent currently in the emulated version. There are two 
>>>> sets of objects in QEMU, switching is done in CAS. KVM support should not 
>>>> change anything in that area. 
>>>> I expect the 'xive-kvm' object to get/set states for migration, just like 
>>>> for XICS and to setup the ESB+TIMA memory regions, which is new. 
>>> But both XICS and XIVE are completely different kernel KVM devices that will
>>> need to "hook" into the same set of internal hooks for things like 
>>> interrupts
>>> being passed through, RTAS calls etc... 
>>> How does KVM knows which one to "activate" ?
>> Can't we add an extra IRQ type and use vcpu->arch.irq_type for that ? 
>> I haven't studied all the low level details though.
>>> I don't think the kernel should have both. 
>> I hear that. From a QEMU perspective, it is much easier to put everything 
>> in place for both interrupt modes and let the guest decide what it wants 
>> to use. 
>> If we choose not to, we will need to find solution to defer the KVM inits
>> and to disconnect/reconnect the vcpus. For the latter, we could add a 
>> KVM_DISABLE_CAP ioctl or maybe better add a new capability like 
>> KVM_CAP_IRQ_XIVE to perform the switch.
>> C.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]