qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 06/35] spapr/xive: introduce a XIVE interrupt


From: Cédric Le Goater
Subject: Re: [Qemu-devel] [PATCH v3 06/35] spapr/xive: introduce a XIVE interrupt presenter model
Date: Fri, 4 May 2018 16:15:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

On 05/04/2018 06:44 AM, David Gibson wrote:
> On Thu, May 03, 2018 at 05:10:48PM +0200, Cédric Le Goater wrote:
>> On 05/03/2018 07:39 AM, David Gibson wrote:
>>> On Thu, Apr 26, 2018 at 07:15:29PM +0200, Cédric Le Goater wrote:
>>>> On 04/26/2018 11:27 AM, Cédric Le Goater wrote:
>>>>> On 04/26/2018 09:11 AM, David Gibson wrote:
>>>>>> On Thu, Apr 19, 2018 at 02:43:02PM +0200, Cédric Le Goater wrote:
>>> [snip]
>>>>>>> +static void xive_tm_os_write(void *opaque, hwaddr offset,
>>>>>>> +                                   uint64_t value, unsigned size)
>>>>>>> +{
>>>>>>> +    PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
>>>>>>> +    XiveNVT *nvt = XIVE_NVT(cpu->intc);
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    if (offset >= TM_SPC_ACK_EBB) {
>>>>>>> +        xive_tm_write_special(nvt, offset, value, size);
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (TM_RING(offset) != TM_QW1_OS) {
>>>>>>
>>>>>> Why have this if you have separate OS and user regions as you appear
>>>>>> to do below?
>>>>>
>>>>> This is another problem we are trying to solve. 
>>>>>
>>>>> The registers a CPU can access depends on the TIMA view it is using. 
>>>>> The OS TIMA view only sees the OS ring registers. The HV view sees all. 
>>>>
>>>> So, I gave a deeper look at the specs and I understood a little more 
>>>> details of the concepts behind. You need to do frequent round-trips 
>>>> to this document ...  
>>>>
>>>> These registers are accessible through four aligned pages, each exposing 
>>>> a different view of the registers. First page (page address ending 
>>>> in 0b00) gives access to the entire context and is reserved for the 
>>>> ring 0 security monitor. The second (page address ending in 0b01) 
>>>> is for the hypervisor, ring 1. The third (page address ending in 0b10) 
>>>> is for the operating system, ring 2. The fourth (page address ending 
>>>> in 0b11) is for user level, ring 3.
>>>>
>>>> The sPAPR machine runs at the OS privilege and therefore can only 
>>>> accesses the OS and the User rings, 2 and 3. The others are for
>>>> hypervisor levels.
>>>
>>> Ok, that much is what I thought.  What I'm less clear on is what each
>>> page looks like compared to the others.  Previously I thought each one
>>> had the same registers, 
>>
>> yes.
>>
>>> just manipulating the corresponding ring.  
>>
>> no. 
>>
>>> Are you saying instead that each ring's page basically has a subset 
>>> of the registers in the next most privileged page?
>>
>> That's the idea. 
> 
> Ah, ok.
> 
>> The registers are defined as follow :
>>
>>      QW-0 User      
>>      QW-1 O/S      
>>      QW-2 Pool   
>>      QW-3 Physical 
>>
>> and the pages :
>>
>> - 0006030203180000 security monitor 
>>   can access all registers 
>>
>> - 0006030203190000 hv
>>   can access all registers minus the secure regs
>>
>> - 00060302031a0000 os
>>   can access some of the OS (QW1) and User (QW0) registers
>>  
>> - 00060302031b0000 user
>>   can access NSR reg of User (QW0) registers
> 
> I can see two reasonable ways of doing this:
> 
> A)
> 
> Have a single set of read/write functions.  These implement all the
> registers but take a "privilege level" parameter which controls which
> will actually work.  Those could then be wired up in one of two ways:
> 
>   A1) Single memory region.  The accessor derives the priv level from
>   the relevant address bits, before masking it down to a single
>   register page.  Then, as above

Yes. That's the goal behind the page ordering :

page address ending in 0b00 : ring 0, security monitor 
page address ending in 0b01 : ring 1, hypervisor 
page address ending in 0b10 : ring 2, operating system  
page address ending in 0b11 : ring 3, user level

I don't why the registers are ordered the other way around though.

That's would be the direction to take for the emulated mode, I think.
It covers well the PowerNV (4 pages) and the sPAPR case (2 pages), 
in each case, the machine IC controller decides how much pages to map.
The memory region ops do the rest.

For KVM, we need to populate the VMA with the host TIMA page associated 
with ring 2 (OS) and then ring 3 (USER). 

This option looks better overall. I will see how ugly it gets with the 
implementation.

C.


>   A2) Multiple memory regions with the same accessor functions but
>   different opaque pointer.  The accessor gets the priv level from
>   its opaque pointer, then the address is just within a single ring's
>   page.
>
> B)
> 
> Separate memory regions with separate accessors.  The ring-0 accessor
> implements the ring-0 registers, then calls the ring-1 accessor
> function for everything else.  ring-1 calls ring-2 and so forth.
>
>> On sPAPR, we can remap the os/user pages to some other base address 
>> but we should keep the same page offset.
> 
> Sure.
> 
>>
>>
>>>> I will try to come with a better implementation of the model and
>>>> make sure the ring numbers are respected. I am not sure we should 
>>>> have only one memory region or four distinct ones with their
>>>> own ops. There are some differences in the load/store of each view.
>>>
>>> Right.  I'm not clear at this point if that's for good reasons, or
>>> just because IBM's hardware designers don't seem to have gotten the
>>> hang of Don't Repeat Yourself.
>>>
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]