Re: [Qemu-devel] [PATCH 1/2] atomics: do not use __atomic primitives for

From: Emilio G. Cota
Subject: Re: [Qemu-devel] [PATCH 1/2] atomics: do not use __atomic primitives for RCU atomics
Date: Mon, 23 May 2016 13:09:12 -0400
On Mon, May 23, 2016 at 09:53:00 -0700, Richard Henderson wrote:
> On 05/21/2016 01:42 PM, Emilio G. Cota wrote:
> >In the process, the atomic_rcu_read/set were converted to implement
> >consume/release semantics, respectively. This is inefficient; for
> >correctness and maximum performance we only need an smp_barrier_depends
> >for reads, and an smp_wmb for writes. Fix it by using the original
> >definition of these two primitives for all compilers.
> For what host do you think this is inefficient?
> In particular, what you've done is going to be less efficient for e.g.
> armv8, where the __atomic formulation is going to produce load-acquire and
> store-release instructions.  Whereas the separate barriers are going to
> produce two insns.
> As for the common case of x86_64, what you're doing is going to make no
> difference at all.
> So what are you trying to improve?

Precisely I tested this on ARMv8. The goal is to not emit a fence at
all, i.e. to emit a single store instead of LDR (load-acquire).

I just realised that under #ifdef __ATOMIC we have:

#define smp_read_barrier_depends() ({ barrier(); 
__atomic_thread_fence(__ATOMIC_CONSUME); barrier(); })

Why? This should be:

#ifdef __alpha__
#define smp_read_barrier_depends()   asm volatile("mb":::"memory")


My patch should have included this additional change to make sense.
Sorry for the confusion.


PS. And really equating smp_wmb/rmb to release/acquire as we have under
#ifdef __ATOMIC is hard to justify, other than to please tsan.

