qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.


From: Alex Bennée
Subject: Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.
Date: Tue, 11 Aug 2015 20:22:19 +0100

Benjamin Herrenschmidt <address@hidden> writes:

> On Tue, 2015-08-11 at 08:54 +0100, Alex Bennée wrote:
>> 
>> > How do you handle the memory model ? IE , ARM and PPC are OO while x86
>> > is (mostly) in order, so emulating ARM/PPC on x86 is fine but emulating
>> > x86 on ARM or PPC will lead to problems unless you generate memory
>> > barriers with every load/store ..
>> 
>> This is the next chunk of work. We have Alvise's LL/SC patches which
>> allow us to do proper emulation of ARMs Load/store exclusive behaviour
>> and any weak order target will have to use such constructs.
>
> God no ! You don't want to use ll/sc for dealing with weak ordering, you
> want to use barriers... ll/sc will allow you to deal with front-end
> things such as atomic inc/dec etc...

Sorry I wasn't clear - ll/sc is required to properly support weak
ordered system atomic-like ops. So while it doesn't offer guarantees on
memory ordering it does ensure you can safely do atomic operations. 

>> Currently the plan is to introduce a barrier TCG op which will translate
>> to the strongest backend barrier available.
>
> I would advocate at least two barriers, full barrier and write barrier,
> so at least when emulating ARM or PPC on x86, we don't actually send
> fences on every load/stores.

I was considering finer grained barriers as an optimisation step.

>
> IE. the x86 memory model is *not* fully ordered, so a ARM or PPC full
> barrier must translate into a x86 fence afaik (or whatever is the x86
> name of its full barrier), but you don't want to translate all ARM/PPC
> weaker barriers into those.
>
>>  Even x86 should be using barriers to ensure cross-core visibility which
>> then leaves LS re-ordering on the same core.
>
> Only for store + load, which is afaik the only case where x86
> re-orders.

To be clear this is non-dependant stores. A store to an address that is
then loaded in the same CPU shouldn't overtake the load.

I note that ARM, PPC and SPARC RMO are all broadly similar in their
models. Alpha seems to be the funky one but as we don't have a backend
for it I think we are OK.

> But in any case, expose to the target (TGC target) the ordering
> expectations of the source so that we can use whatever facilities might
> be at hand to avoid some of those barriers, for example the SAO mapping
> attribute I mentioned.
>
> I'll try to look at your patch more closely when I get a chance and see
> if I can produce a ppc target but don't hold your breath, I'm a bit
> swamped at the moment.

We haven't done anything on barriers yet. I've mostly been concentrating
writing up the test cases to demonstrate the failures but given the x86
backend it is surprisingly hard to come up with a test case that will
fail. I suspect I need to port my ARM tests to x86 and run on the ARM
backend.

-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]