qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 04/35] spapr/xive: introduce a XIVE interrupt


From: David Gibson
Subject: Re: [Qemu-devel] [PATCH v3 04/35] spapr/xive: introduce a XIVE interrupt controller for sPAPR
Date: Fri, 4 May 2018 13:33:25 +1000
User-agent: Mutt/1.9.3 (2018-01-21)

On Thu, May 03, 2018 at 06:50:09PM +0200, Cédric Le Goater wrote:
> On 05/03/2018 07:22 AM, David Gibson wrote:
> > On Thu, Apr 26, 2018 at 12:43:29PM +0200, Cédric Le Goater wrote:
> >> On 04/26/2018 06:20 AM, David Gibson wrote:
> >>> On Tue, Apr 24, 2018 at 11:46:04AM +0200, Cédric Le Goater wrote:
> >>>> On 04/24/2018 08:51 AM, David Gibson wrote:
> >>>>> On Thu, Apr 19, 2018 at 02:43:00PM +0200, Cédric Le Goater wrote:
> >>>>>> sPAPRXive is a model for the XIVE interrupt controller device of the
> >>>>>> sPAPR machine. It holds the routing XIVE table, the Interrupt
> >>>>>> Virtualization Entry (IVE) table which associates interrupt source
> >>>>>> numbers with targets.
> >>>>>>
> >>>>>> Also extend the XiveFabric with an accessor to the IVT. This will be
> >>>>>> needed by the routing algorithm.
> >>>>>>
> >>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
> >>>>>> ---
> >>>>>>
> >>>>>>  May be should introduce a XiveRouter model to hold the IVT. To be
> >>>>>>  discussed.
> >>>>>
> >>>>> Yeah, maybe.  Am I correct in thinking that on pnv there could be more
> >>>>> than one XiveRouter?
> >>>>
> >>>> There is only one, the main IC. 
> >>>
> >>> Ok, that's what I thought originally.  In that case some of the stuff
> >>> in the patches really doesn't make sense to me.
> >>
> >> well, there is one IC per chip on powernv, but we haven't reach that part
> >> yet.
> > 
> > Hmm.  There's some things we can delay dealing with, but I don't think
> > this is one of them.  I think we need to understand how multichip is
> > going to work in order to come up with a sane architecture.  Otherwise
> > I fear we'll end up with something that we either need to horribly
> > bastardize for multichip, or have to rework things dramatically
> > leading to migration nightmares.
> 
> So, it is all controlled by MMIO, so we should be fine on that part. 
> As for the internal tables, they are all configured by firmware, using
> a chip identifier (block). I need to check how the remote XIVE are 
> accessed. I think this is by MMIO. 

Right, but for powernv we execute OPAL inside the VM, rather than
emulating its effects.  So we still need to model the actual hardware
interfaces.  OPAL hides the details from the kernel, but not from us
on the other side.

> I haven't looked at multichip XIVE support but I am not too worried as 
> the framework is already in place for the machine.
>  
> >>>>> If we did have a XiveRouter, I'm not sure we'd need the XiveFabric
> >>>>> interface, possibly its methods could just be class methods of
> >>>>> XiveRouter.
> >>>>
> >>>> Yes. We could introduce a XiveRouter to share the ivt table between 
> >>>> the sPAPRXive and the PnvXIVE models, the interrupt controllers of
> >>>> the machines. Methods would provide way to get the ivt/eq/nvt
> >>>> objects required for routing. I need to add a set_eq() to push the
> >>>> EQ data.
> >>>
> >>> Hrm.  Well, to add some more clarity, let's say the XiveRouter is the
> >>> object which owns the IVT.  
> >>
> >> OK. that would be a model with some state and not an interface.
> > 
> > Yes.  For papr variant it would have the whole IVT contents as its
> > state.  For the powernv, just the registers telling it where to find
> > the IVT in RAM.
> > 
> >>> It may or may not do other stuff as well.
> >>
> >> Its only task would be to do the final event routing: get the IVE,
> >> get the EQ, push the EQ DATA in the OS event queue, notify the CPU.
> > 
> > That seems like a lot of steps.  Up to push the EQ DATA, certainly.
> > And I guess it'll have to ping an NVT somehow, but I'm not sure it
> > should know about CPUs as such.
> 
> For PowerNV, the concept could be generalized, yes. An NVT can 
> contain the interrupt state of a logical server but the common 
> case is baremetal without guests for QEMU and so we have a NVT 
> per cpu. 

Hmm.  We eventually want to support a kernel running guests under
qemu/powernv though, right?  So even if we don't allow it right now,
we don't want allowing that to require major surgery to our
architecture.

> PowerNV will have some limitation but we can make it better than 
> today for sure. It boots.
> 
> We can improve some of the NVT notification process, the way NVT 
> are matched eventually. may be support remote engines if the
> NVT is not local. I have not looked at the details.
> 
> > I'm not sure at this stage what should own the EQD table.
> 
> The EQDT is in RAM.

Not for spapr, it's not.  And even when it is in RAM, something needs
to own the register that gives its base address.

> > In the multichip case is there one EQD table for every IVT?
> 
> There is one EQDT per chip, same for the IVT. They are in RAM, 
> identified with a block ID.
> 
> >  I'm guessing
> > not - I figure the EQD table must be effectively global so that any
> > chip's router can send events to any EQ in the whole system.
> >>>> Now IIUC, on pnv the IVT lives in main system memory.  
> >>
> >> yes. It is allocated by skiboot in RAM and fed to the HW using some 
> >> IC configuration registers. Then, each entry is configured with OPAL 
> >> calls and the HW is updated using cache scrub registers. 
> > 
> > Right.  At least for the first pass we should be able to treat the
> > cache scrub registers as no-ops and just not cache anything in the
> > qemu implementation.
> 
> The model currently supports the cache scrub registers, we need it
> to update some values. It's not too complex.

Ok.

> >>> Under PAPR is the IVT in guest memory, or is it outside (updated by
> >>> hypercalls/rtas)?
> >>
> >> Under sPAPR, the IVT is updated by the H_INT_SET_SOURCE_CONFIG hcall
> >> which configures the targeting of an IRQ. It's not in the guest 
> >> memory.
> > 
> > Right.
> > 
> >> Behind the hood, the IVT is still configured by OPAL under KVM and 
> >> by QEMU when kernel_irqchip=off
> > 
> > Sure.  Even with kernel_irqchip=on there's still logically a guest IVT
> > (or "IVT view" I guess), even if it's actual entries are stored
> > distributed across various places in the host's IVTs.
> 
> yes. The XIVE KVM device caches the info. This is used to dump the 
> state without doing OPAL calls.
> 
> C. 
> 
> 
> >>>> The XiveRouter would also be a XiveFabric (or some other name) to 
> >>>> let the internal sources of the interrupt controller forward events.
> >>>
> >>> The further we go here, the less sure I am that XiveFabric even makes
> >>> sense as a concept.
> >>
> >> See previous email.
> > 
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]