qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH v3 07/35] spapr/xive: introduce the XIVE Event Que


From: David Gibson
Subject: Re: [Qemu-ppc] [PATCH v3 07/35] spapr/xive: introduce the XIVE Event Queues
Date: Sat, 5 May 2018 14:29:44 +1000
User-agent: Mutt/1.9.3 (2018-01-21)

On Fri, May 04, 2018 at 03:29:02PM +0200, Cédric Le Goater wrote:
> On 05/04/2018 07:19 AM, David Gibson wrote:
> > On Thu, May 03, 2018 at 04:37:29PM +0200, Cédric Le Goater wrote:
> >> On 05/03/2018 08:25 AM, David Gibson wrote:
> >>> On Thu, May 03, 2018 at 08:07:54AM +0200, Cédric Le Goater wrote:
> >>>> On 05/03/2018 07:45 AM, David Gibson wrote:
> >>>>> On Thu, Apr 26, 2018 at 11:48:06AM +0200, Cédric Le Goater wrote:
> >>>>>> On 04/26/2018 09:25 AM, David Gibson wrote:
> >>>>>>> On Thu, Apr 19, 2018 at 02:43:03PM +0200, Cédric Le Goater wrote:
> >>>>>>>> The Event Queue Descriptor (EQD) table is an internal table of the
> >>>>>>>> XIVE routing sub-engine. It specifies on which Event Queue the event
> >>>>>>>> data should be posted when an exception occurs (later on pulled by 
> >>>>>>>> the
> >>>>>>>> OS) and which Virtual Processor to notify.
> >>>>>>>
> >>>>>>> Uhhh.. I thought the IVT said which queue and vp to notify, and the
> >>>>>>> EQD gave metadata for event queues.
> >>>>>>
> >>>>>> yes. the above poorly written. The Event Queue Descriptor contains the
> >>>>>> guest address of the event queue in which the data is written. I will 
> >>>>>> rephrase.      
> >>>>>>
> >>>>>> The IVT contains IVEs which indeed define for an IRQ which EQ to 
> >>>>>> notify 
> >>>>>> and what data to push on the queue. 
> >>>>>>  
> >>>>>>>> The Event Queue is a much
> >>>>>>>> more complex structure but we start with a simple model for the sPAPR
> >>>>>>>> machine.
> >>>>>>>>
> >>>>>>>> There is one XiveEQ per priority and these are stored under the XIVE
> >>>>>>>> virtualization presenter (sPAPRXiveNVT). EQs are simply indexed with 
> >>>>>>>> :
> >>>>>>>>
> >>>>>>>>        (server << 3) | (priority & 0x7)
> >>>>>>>>
> >>>>>>>> This is not in the XIVE architecture but as the EQ index is never
> >>>>>>>> exposed to the guest, in the hcalls nor in the device tree, we are
> >>>>>>>> free to use what fits best the current model.
> >>>>>>
> >>>>>> This EQ indexing is important to notice because it will also show up 
> >>>>>> in KVM to build the IVE from the KVM irq state.
> >>>>>
> >>>>> Ok, are you saying that while this combined EQ index will never appear
> >>>>> in guest <-> host interfaces, 
> >>>>
> >>>> Indeed.
> >>>>
> >>>>> it might show up in qemu <-> KVM interfaces?
> >>>>
> >>>> Not directly but it is part of the IVE as the IVE_EQ_INDEX field. When
> >>>> dumped, it has to be built in some ways, compatible with the emulated 
> >>>> mode in QEMU. 
> >>>
> >>> Hrm.  But is the exact IVE contents visible to qemu (for a PAPR
> >>> guest)?  
> >>
> >> The guest only uses hcalls which arguments are :
> >>  
> >>    - cpu numbers,
> >>    - priority numbers from defined ranges, 
> >>    - logical interrupt numbers.  
> >>    - physical address of the EQ 
> >>
> >> The visible parts for the guest of the IVE are the 'priority', the 'cpu', 
> >> and the 'eisn', which is the effective IRQ number the guest is assigning 
> >> to the source. The 'eisn" will be pushed in the EQ.
> > 
> > Ok.
> > 
> >> The IVE EQ index is not visible.
> > 
> > Good.
> > 
> >>> I would have thought the qemu <-> KVM interfaces would have
> >>> abstracted this the same way the guest <-> KVM interfaces do.  > Or is 
> >>> there a reason not to?
> >>
> >> It is practical to dump 64bit IVEs directly from KVM into the QEMU 
> >> internal structures because it fits the emulated mode without doing 
> >> any translation ... This might be seen as a shortcut. You will tell 
> >> me when you reach the KVM part.   
> > 
> > Ugh.. exposing to qemu the raw IVEs sounds like a bad idea to me.
> 
> You definitely need to in QEMU in emulation mode. The whole routing 
> relies on it. 

I'm not exactly sure what you mean by "emulation mode" here.  Above,
I'm talking specifically about a KVM HV, PAPR guest.

> > When we migrate, we're going to have to assign the guest (server,
> > priority) tuples to host EQ indicies, and I think it makes more sense
> > to do that in KVM and hide the raw indices from qemu than to have qemu
> > mangle them explicitly on migration.
> 
> We will need some mangling mechanism for the KVM ioctls saving and
> restoring state. This is very similar to XICS. 
>  
> >>>>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
> >>>>>>>
> >>>>>>> Is the EQD actually modifiable by a guest?  Or are the settings of the
> >>>>>>> EQs fixed by PAPR?
> >>>>>>
> >>>>>> The guest uses the H_INT_SET_QUEUE_CONFIG hcall to define the address
> >>>>>> of the event queue for a couple prio/server.
> >>>>>
> >>>>> Ok, so the EQD can be modified by the guest.  In which case we need to
> >>>>> work out what object owns it, since it'll need to migrate it.
> >>>>
> >>>> Indeed. The EQD are CPU related as there is one EQD per couple (cpu, 
> >>>> priority). The KVM patchset dumps/restores the eight XiveEQ struct 
> >>>> using per cpu ioctls. The EQ in the OS RAM is marked dirty at that
> >>>> stage.
> >>>
> >>> To make sure I'm clear: for PAPR there's a strict relationship between
> >>> EQD and CPU (one EQD for each (cpu, priority) tuple).  
> >>
> >> Yes.
> >>
> >>> But for powernv that's not the case, right?  
> >>
> >> It is.
> > 
> > Uh.. I don't think either of us phrased that well, I'm still not sure
> > which way you're answering that.
> 
> there's a strict relationship between EQD and CPU (one EQD for each (cpu, 
> priority) tuple) in spapr and in powernv.

For powernv that seems to be contradicted by what you say below.
AFAICT there might be a strict association at the host kernel or even
the OPAL level, but not at the hardware level.

> >>> AIUI the mapping of EQs to cpus was configurable, is that right?
> >>
> >> Each cpu has 8 EQD. Same for virtual cpus.
> > 
> > Hmm.. but is that 8 EQD per cpu something built into the hardware, or
> > just a convention of how the host kernel and OPAL operate?
> 
> It's not in the HW, it is used by the HW to route the notification. 
> The EQD contains the EQ characteristics :
> 
> * functional bits :
>   - valid bit
>   - enqueue bit, to update OS in RAM EQ or not
>   - unconditional notification
>   - backlog
>   - escalation
>   - ...
> * OS EQ fields 
>   - physical address
>   - entry index
>   - toggle bit
> * NVT fields
>   - block/chip
>   - index
> * etc.
> 
> It's a big structure : 8 words.

Ok.  So yeah, the cpu association of the EQ is there in the NVT
fields, not baked into the hardware.

> The EQD table is allocated by OPAL/skiboot and fed to the HW for
> its use. The OS powernv uses OPAL calls  configure the EQD with its 
> needs : 
> 
> int64_t opal_xive_set_queue_info(uint64_t vp, uint32_t prio,
>                                uint64_t qpage,
>                                uint64_t qsize,
>                                uint64_t qflags);
> 
> 
> sPAPR uses an hcall :
> 
> static long plpar_int_set_queue_config(unsigned long flags,
>                                      unsigned long target,
>                                      unsigned long priority,
>                                      unsigned long qpage,
>                                      unsigned long qsize)
> 
> 
> but it is translated in an OPAL call in KVM.
> 
> C.
> 
>  
> >  
> >>
> >> I am not sure what you understood before ? It is surely something
> >> I wrote, my XIVE understanding is still making progress.
> >>
> >>
> >> C.
> >>
> > 
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]