qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code
Date: Sat, 24 Sep 2011 12:00:44 +0200

On 24.09.2011, at 10:44, Blue Swirl wrote:

> On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf <address@hidden> wrote:
>> 
>> On 24.09.2011, at 09:41, Blue Swirl wrote:
>> 
>>> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood <address@hidden> wrote:
>>>> On 09/19/2011 06:35 AM, Alexander Graf wrote:
>>>>> 
>>>>> On 17.09.2011, at 19:40, Blue Swirl wrote:
>>>>> 
>>>>>> On Sat, Sep 17, 2011 at 5:15 PM, Alexander Graf <address@hidden> wrote:
>>>>>>> 
>>>>>>> Am 17.09.2011 um 18:58 schrieb Blue Swirl <address@hidden>:
>>>>>>> 
>>>>>>>> On Sparc32, there is no need for a PV device. The CPU is woken up from
>>>>>>>> halted state with an IPI. Maybe you could use this approach?
>>>>>>> 
>>>>>>> The way it's done here is defined by u-boot and now also nailed down in 
>>>>>>> the ePAPR architecture spec. While alternatives might be more 
>>>>>>> appealing, this is how guests work today :).
>>>>>> 
>>>>>> OK. I hoped that there were no implementations yet. The header (btw
>>>>>> missing) should point to the spec.
>>>> 
>>>> The goal with the spin table stuff, suboptimal as it is, was something
>>>> that would work on any powerpc implementation.  Other
>>>> implementation-specific release mechanisms are allowed, and are
>>>> indicated by a property in the cpu node, but only if the loader knows
>>>> that the OS supports it.
>>>> 
>>>>> IIUC the spec that includes these bits is not finalized yet. It is 
>>>>> however in use on all u-boot versions for e500 that I'm aware of and the 
>>>>> method Linux uses to bring up secondary CPUs.
>>>> 
>>>> It's in ePAPR 1.0, which has been out for a while now.  ePAPR 1.1 was
>>>> just released which clarifies some things such as WIMG.
>>>> 
>>>>> Stuart / Scott, do you have any pointers to documentation where the 
>>>>> spinning is explained?
>>>> 
>>>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf
>>> 
>>> Chapter 5.5.2 describes the table. This is actually an interface
>>> between OS and Open Firmware, obviously there can't be a real hardware
>>> device that magically loads r3 etc.
>>> 
>>> The device method would break abstraction layers, it's much like
>>> vmport stuff in x86. Using a hypercall would be a small improvement.
>>> Instead it should be possible to implement a small boot ROM which puts
>>> the secondary CPUs into managed halt state without spinning, then the
>>> boot CPU could send an IPI to a halted CPU to wake them up based on
>>> the spin table, just like real HW would do. On Sparc32 OpenBIOS this
>>> is something like a few lines of ASM on both sides.
>> 
>> That sounds pretty close to what I had implemented in v1. Back then the only 
>> comment was to do it using this method from Scott. Maybe one day we will get 
>> u-boot support. Then u-boot will spin on the CPU itself and when that time 
>> comes, we can check if we can implement a prettier version.
>> 
>> Btw, we can't do the IPI method without exposing something to the guest that 
>> u-boot would usually not expose. There simply is no event. All that happens 
>> is a write to memory to tell the other CPU that it should wake up. So while 
>> sending an IPI to the other CPU is the "clean" way to go, I agree, we can 
>> either be compatible or "clean". And if I get the choice I'm rather 
>> compatible.
> 
> There are also warts in Sparc32 design, for example there is no
> instruction to halt the CPU, instead a device (only available on some
> models) can do it.

Ugh, nice :)

> 
>> So we have the choice between having code inside the guest that spins, maybe 
>> even only checks every x ms, by programming a timer, or we can try to make 
>> an event out of the memory write. V1 was the former, v2 (this one) is the 
>> latter. This version performs a lot better and is easier to understand.
> 
> The abstraction layers should not be broken lightly, I suppose some
> performance or laziness^Wlocal optimization reasons were behind vmport
> design too. The ideal way to solve this could be to detect a spinning
> CPU and optimize that for all architectures, that could be tricky
> though (if a CPU remains in the same TB for extended periods, inspect
> the TB: if it performs a loop with a single load instruction, replace
> the load by a special wait operation for any memory stores to that
> page).

I agree.

However, for now I'd like to have _something_ that we can easily replace later 
on. We don't do savevm or migration yet, so the danger of changing the device 
model from one version to the next is minimal. To the guest kernel, this is 
seamless, as the interface stays exactly the same.

In fact, the whole kernel loading way we go today is pretty much wrong. We 
should rather do it similar to OpenBIOS where firmware always loads and then 
pulls the kernel from QEMU using a PV interface. At that point, we would have 
to implement such an optimization as you suggest. Or implement a hypercall :). 
But at least we'd always be running the same guest software stack.

So what I'm suggesting is that for now, we're making progress and then scratch 
the device we're introducing here later on, when we move towards different 
models on how to initialize the machine. As it stands however, I much rather 
have working code here and concentrate on the 50 other places that are broken 
than optimize a case that already works well enough because it could be done 
prettier. Let's rather iterate over this interface again when we hit another 
road block. At that point in time, we'll have more experience with the 
shortcomings too.


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]