[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE
Cédric Le Goater
Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller
Thu, 12 Apr 2018 10:28:19 +0200
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2
On 04/12/2018 07:08 AM, David Gibson wrote:
> On Thu, Dec 21, 2017 at 11:12:06AM +1100, Benjamin Herrenschmidt wrote:
>> On Wed, 2017-12-20 at 16:09 +1100, David Gibson wrote:
>>> As you've suggested in yourself, I think we might need to more
>>> explicitly model the different components of the XIVE system. As part
>>> of that, I think you need to be clearer in this base skeleton about
>>> exactly what component your XIVE object represents.
>>> If the answer is "the overall thing" I suspect that's not what you
>>> want - I had one of those for XICs which proved to be a mistake
>>> (eventually replaced by the XICSFabric interface).
>>> Changing the model later isn't impossible, but doing so without
>>> breaking migration can be a real pain, so I think it's worth a
>>> reasonable effort to try and get it right initially.
>> Note: we do need to speed things up a bit, as having exploitation mode
>> in KVM will significantly help with IPI performance among other things.
>> I'm about ready to do the KVM bits. The one thing we need to discuss
>> and figure a good design for is how we map all those interrupt control
>> pages into qemu.
>> Each interrupt (either PCIe pass-through or the "generic XIVE IPIs"
>> which are used for guest IPIs and for vio/virtio/emulated interrupts)
>> comes with a "control page" (ESB page) which needs to be mapped into
>> the guest, and the generic IPIs also come with a trigger page which
>> needs to be mapped into the guest for guest IPIs or OpenCAPI
>> interrupts, or just qemu for emulated devices.
>> Now that can be thousands of these critters. I certainly don't want to
>> create thousands of VMAs in qemu and even less thousands of memory
>> regions in KVM.
>> So we need some kind of mechanism by wich a single large VMA gets
>> mmap'ed into qemu (or maybe a couple of these, but not too many) and
>> the interrupt pages can be assigned to slots in there and demand
> Ok, I see your point. We'll definitely need to be able to map things
> in as a block, rather than one by one.
So, the approach taken is to use a mmap() exposed in a single ram_device
memory region to the guest. The size is the irq number space size.
This is hardcoded to 4096 (IPIs) + 1024 (virtual device interrupts) in
QEMU. We can change that, but the 4K split is important for XICS
compatibility. The kvm xive device should self adapt.
>> For the generic interrupts, this can probably be covered by KVM, adding
>> some arch ioctls for allocating IPIs and mmap'ing that region etc...
>> For pass-through, it's trickier, we don't want to mmap each irqfd
>> individually for the above reason, so we want to "link" them to KVM. We
>> don't want to allow qemu to take control of any arbitrary interrupt in
>> the system though, so it has to related to the ownership of the irqfd
>> coming from vfio.
>> OpenCAPI I suspect will be its own can of worms...
>> Also, have we decided how the process of switching between XICS and
>> XIVE will work vs. CAS ? And how that will interact with KVM ? I was
>> thinking the kernel would implement a different KVM device type, ie
>> the "emulated XICS" would remain KVM_DEV_TYPE_XICS and XIVE would be