[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE
Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Thu, 12 Apr 2018 15:15:03 +1000
On Wed, Jan 17, 2018 at 03:39:46PM +0100, Cédric Le Goater wrote:
> On 01/17/2018 12:10 PM, Benjamin Herrenschmidt wrote:
> > On Wed, 2018-01-17 at 10:18 +0100, Cédric Le Goater wrote:
> >>>>> Also, have we decided how the process of switching between XICS and
> >>>>> XIVE will work vs. CAS ?
> >>>> That's how it is described in the architecture. The current choice is
> >>>> to create both XICS and XIVE objects and choose at CAS which one to
> >>>> use. It relies today on the capability of the pseries machine to
> >>>> allocate IRQ numbers for both interrupt controller backends. These
> >>>> patches have been merged in QEMU.
> >>>> A change of interrupt mode results in a reset. The device tree is
> >>>> populated accordingly and the ICPs are switched for the model in
> >>>> use.
> >>> For KVM we need to only instanciate one of them though.
> >> Hmm,
> >> How would we handle a guest rebooting on a kernel without XIVE support ?
> > It will do CAS again and we can change the devices.
> So, we would destroy the previous QEMU ICS object and create a new one
> in the CAS hcall. That would probably work. There might be some issues
> in creating and destroying the ICS KVM device, but that can be studied
> without XIVE.
Adding and removing devices at runtime based on guest requests like
this will get really hairy in qemu.
As I've said before for the first cut, I think we want to select just
one as a machine option to avoid this confusion.
Looking further ahead, I think we'll be better off having both the
XIVE and XICS models always present (at least minimally) in qemu, but
with only one "active" at any given time.
Note that having the inactive one destroy and clean up the
corresponding KVM devices is fine, as is deallocating as much of its
runtime state as we can without changing the notional QOM tree.
> It used to be considered ugly to create a QEMU device at reset time, so
> I wonder if this is still the case, because when the machine reaches CAS,
> we really are beyond reset.
> If this is OK, then the next "issue" is to keep in sync the allocated
> IRQ numbers. The IRQ allocator is now merged at the machine level, so
> the synchronization is obvious to do when both backend QEMU objects
> are available. that's the path I took. If both QEMU objects are not
> available, then we need to scan the IRQ number space in the current
> interrupt mode and allocate the same IRQs in the newly negotiated mode.
> Probably OK. I don't see major problems with the current code.
> Migration is a problem. We will need both backend QEMU objects to be
> available anyhow if we want to migrate. So we are back to the current
> solution creating both QEMU objects but we can try to defer some of the
> KVM inits and create the KVM device on demand at CAS time.
> The next problem is the ICP object that currently needs the KVM device
> fd to connect the vcpus ... So, we will need to change that also.
> That is probably the biggest problem today. We need a way to disconnect
> the vpcu from the KVM device and see how we can defer the connection.
> I need to make sure this is possible, I can check that without XIVE
> I think.
> >> Are you suggesting to create the XICS or XIVE device in the CAS
> >> negotiation
> >> process ? So, the machine would not have any interrupt controller before
> >> CAS. That seems really late to me. grub uses the console for instance.
> > We start with XICS by default.
> >> I think it should prepare for both options, start in XIVE legacy mode,
> >> which is XICS, then possibly switch to XIVE exploitation mode.
> >>>>> And how that will interact with KVM ?
> >>>> I expect we will do the same, which is to create two KVM devices to
> >>>> be able to handle both interrupt controller backends depending on the
> >>>> mode negotiated by the guest.
> >>> That will be an ungodly mess, I'd rather we only instanciate the right
> >>> one.
> >> It's rather transparent currently in the emulated version. There are two
> >> sets of objects in QEMU, switching is done in CAS. KVM support should not
> >> change anything in that area.
> >> I expect the 'xive-kvm' object to get/set states for migration, just like
> >> for XICS and to setup the ESB+TIMA memory regions, which is new.
> > But both XICS and XIVE are completely different kernel KVM devices that will
> > need to "hook" into the same set of internal hooks for things like
> > interrupts
> > being passed through, RTAS calls etc...
> > How does KVM knows which one to "activate" ?
> Can't we add an extra IRQ type and use vcpu->arch.irq_type for that ?
> I haven't studied all the low level details though.
> > I don't think the kernel should have both.
> I hear that. From a QEMU perspective, it is much easier to put everything
> in place for both interrupt modes and let the guest decide what it wants
> to use.
> If we choose not to, we will need to find solution to defer the KVM inits
> and to disconnect/reconnect the vcpus. For the latter, we could add a
> KVM_DISABLE_CAP ioctl or maybe better add a new capability like
> KVM_CAP_IRQ_XIVE to perform the switch.
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
Description: PGP signature