qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 0/5] Slow-path for atomic instruction translation


From: alvise rigo
Subject: Re: [Qemu-devel] [RFC 0/5] Slow-path for atomic instruction translation
Date: Wed, 6 May 2015 18:19:03 +0200

Hi Mark,

Firstly, thank you for your feedback.

On Wed, May 6, 2015 at 5:55 PM, Mark Burton <address@hidden> wrote:
> A massive thank you for doing this work Alvise,
>
> On our side, the patch we suggested is only applicable for ARM, though the 
> mechanism would work for any CPU,
>         - BUT
> It doesn’t force atomic instructions out through the slow path. This is 
> either a very good thing (it’s much faster), or a very bad thing (it doesn’t 
> allow you to treat them in the IO space), depending on your point of view.

Indeed, this is for sure a more invasive approach, but it's made on
purpose to have control over those non-atomic stores that might modify
the 'linked' memory.

>
> Depending on what the rest of the community thinks, it seems to me we should 
> apply both patches so that e.g. ARM’s existing atomic instructions run much 
> faster and above all more ‘accurately’ - (with the patch we’ve provided),  
> and the same mechanism can be applied to all other architectures - but we can 
> - somehow - swap for this more ‘controllable’ implementation when e.g. the 
> mutex is located in IO space….

Yes, this makes sense.

Thank you,
alvise

>
> Cheers
>
> Mark.
>
>> On 6 May 2015, at 17:38, Alvise Rigo <address@hidden> wrote:
>>
>> This patch series provides an infrastructure for atomic
>> instruction implementation in QEMU, paving the way for TCG multi-threading.
>> The adopted design does not rely on host atomic
>> instructions and is intended to propose a 'legacy' solution for
>> translating guest atomic instructions.
>>
>> The underlying idea is to provide new TCG instructions that guarantee
>> atomicity to some memory accesses or in general a way to define memory
>> transactions. More specifically, a new pair of TCG instructions are
>> implemented, qemu_ldlink_i32 and qemu_stcond_i32, that behave as
>> LoadLink and StoreConditional primitives (only 32 bit variant
>> implemented).  In order to achieve this, a new bitmap is added to the
>> ram_list structure (always unique) which flags all memory pages that
>> could not be accessed directly through the fast-path, due to previous
>> exclusive operations. This new bitmap is coupled with a new TLB flag
>> which forces the slow-path exectuion. All stores which take place
>> between an LL/SC operation by other vCPUs in the same memory page, will
>> fail the subsequent StoreConditional.
>>
>> In theory, the provided implementation of TCG LoadLink/StoreConditional
>> can be used to properly handle atomic instructions on any architecture.
>>
>> The new slow-path is implemented such that:
>> - the LoadLink behaves as a normal load slow-path, except for cleaning
>>  the dirty flag in the bitmap. The TLB entries created from now on will
>>  force the slow-path. To ensure it, we flush the TLB cache for the
>>  other vCPUs
>> - the StoreConditional behaves as a normal store slow-path, except for
>>  checking the state of the dirty bitmap and returning 0 or 1 whether or
>>  not the StoreConditional succeeded (0 when no vCPU has touched the
>>  same memory in the mean time).
>>
>> All those write accesses that are forced to follow the 'legacy'
>> slow-path will set the accessed memory page to dirty.
>>
>> In this series only the ARM ldrex/strex instructions are implemented.
>> The code was tested with bare-metal test cases and with Linux, using
>> upstream QEMU.
>>
>> This work has been sponsored by Huawei Technologies Dusseldorf GmbH.
>>
>> Alvise Rigo (5):
>>  exec: Add new exclusive bitmap to ram_list
>>  Add new TLB_EXCL flag
>>  softmmu: Add helpers for a new slow-path
>>  tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions
>>  target-arm: translate: implement qemu_ldlink and qemu_stcond ops
>>
>> cputlb.c                |  11 ++-
>> include/exec/cpu-all.h  |   1 +
>> include/exec/cpu-defs.h |   2 +
>> include/exec/memory.h   |   3 +-
>> include/exec/ram_addr.h |  19 +++-
>> softmmu_llsc_template.h | 233 
>> ++++++++++++++++++++++++++++++++++++++++++++++++
>> softmmu_template.h      |  52 ++++++++++-
>> target-arm/translate.c  |  94 ++++++++++++++++++-
>> tcg/arm/tcg-target.c    | 105 ++++++++++++++++------
>> tcg/tcg-be-ldst.h       |   2 +
>> tcg/tcg-op.c            |  20 +++++
>> tcg/tcg-op.h            |   3 +
>> tcg/tcg-opc.h           |   4 +
>> tcg/tcg.c               |   2 +
>> tcg/tcg.h               |  20 +++++
>> 15 files changed, 538 insertions(+), 33 deletions(-)
>> create mode 100644 softmmu_llsc_template.h
>>
>> --
>> 2.4.0
>>
>
>
>          +44 (0)20 7100 3485 x 210
>  +33 (0)5 33 52 01 77x 210
>
>         +33 (0)603762104
>         mark.burton
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]