[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE

From: Benjamin Herrenschmidt
Subject: Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Date: Fri, 22 Dec 2017 09:53:37 +1100

On Thu, 2017-12-21 at 10:16 +0100, Cédric Le Goater wrote:
> On 12/21/2017 01:12 AM, Benjamin Herrenschmidt wrote:
> > On Wed, 2017-12-20 at 16:09 +1100, David Gibson wrote:
> > > 
> > > As you've suggested in yourself, I think we might need to more
> > > explicitly model the different components of the XIVE system.  As part
> > > of that, I think you need to be clearer in this base skeleton about
> > > exactly what component your XIVE object represents.
> > > 
> > > If the answer is "the overall thing" I suspect that's not what you
> > > want - I had one of those for XICs which proved to be a mistake
> > > (eventually replaced by the XICSFabric interface).
> > > 
> > > Changing the model later isn't impossible, but doing so without
> > > breaking migration can be a real pain, so I think it's worth a
> > > reasonable effort to try and get it right initially.
> > 
> > Note: we do need to speed things up a bit, as having exploitation mode
> > in KVM will significantly help with IPI performance among other things.
> > 
> > I'm about ready to do the KVM bits. The one thing we need to discuss
> > and figure a good design for is how we map all those interrupt control
> > pages into qemu.
> > 
> > Each interrupt (either PCIe pass-through or the "generic XIVE IPIs"
> > which are used for guest IPIs and for vio/virtio/emulated interrupts)
> > comes with a "control page" (ESB page) which needs to be mapped into
> > the guest, and the generic IPIs also come with a trigger page which
> > needs to be mapped into the guest for guest IPIs or OpenCAPI
> > interrupts, or just qemu for emulated devices.
> what about the OS TIMA page ? Do we trap the accesses in QEMU and
> forward them to KVM ? or do we use a similar mechanism. 

No, no, we'll have an mmap facility for it in kvm but it worries me
less as there's only one of these and there's little damage qemu can do
having access to it :)
> > Now that can be thousands of these critters. I certainly don't want to
> > create thousands of VMAs in qemu and even less thousands of memory
> > regions in KVM.
> we can provision one mapping per kvmppc_xive_src_block  maybe ?  

Maybe. Last I looked KVM walk of memory regions was linear though. Mind
you it's not a huge deal if the guest RAM is always in the first

> > So we need some kind of mechanism by wich a single large VMA gets
> > mmap'ed into qemu (or maybe a couple of these, but not too many) and
> > the interrupt pages can be assigned to slots in there and demand
> > faulted.
> Frederic has started to put in place a similar mecanism for OpenCAPI.

I know, though he made it rather OpenCAPI specific which is going to be
"interesting" when it comes to virtualizing OpenCAPI...

> > For the generic interrupts, this can probably be covered by KVM, adding
> > some arch ioctls for allocating IPIs and mmap'ing that region etc...
> The KVM device has a ioctl handler :
>       struct kvm_device_ops {
>               long (*ioctl)(struct kvm_device *dev, unsigned int ioctl,
>                             unsigned long arg);
>       };
> So a KVM device for the XIVE interrupt controller can implement a couple 
> of extra calls for its need, like getting the VMA addresses, etc
> > For pass-through, it's trickier, we don't want to mmap each irqfd
> > individually for the above reason, so we want to "link" them to KVM. We
> > don't want to allow qemu to take control of any arbitrary interrupt in
> > the system though, so it has to related to the ownership of the irqfd
> > coming from vfio.
> > 
> > OpenCAPI I suspect will be its own can of worms...
> > 
> > Also, have we decided how the process of switching between XICS and
> > XIVE will work vs. CAS ? 
> That's how it is described in the architecture. The current choice is
> to create both XICS and XIVE objects and choose at CAS which one to
> use. It relies today on the capability of the pseries machine to 
> allocate IRQ numbers for both interrupt controller backends. These
> patches have been merged in QEMU.
> A change of interrupt mode results in a reset. The device tree is 
> populated accordingly and the ICPs are switched for the model in 
> use. 

For KVM we need to only instanciate one of them though.

> > And how that will interact with KVM ? 
> I expect we will do the same, which is to create two KVM devices to 
> be able to handle both interrupt controller backends depending on the 
> mode negotiated by the guest.  

That will be an ungodly mess, I'd rather we only instanciate the right

> > I was
> > thinking the kernel would implement a different KVM device type, ie
> > the "emulated XICS" would remain KVM_DEV_TYPE_XICS and XIVE would be
> yes. it makes sense. The new device will have a lot in common with the 
> KVM_DEV_TYPE_XICS using kvm_xive_ops.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]