[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atom
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions |
Date: |
Mon, 4 Apr 2016 19:03:48 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 |
On 04/04/2016 18:26, Pranith Kumar wrote:
> Hi Paolo,
>
> On Mon, Apr 4, 2016 at 4:14 AM, Paolo Bonzini <address@hidden> wrote:
>>
>> The issue is that atomic_thread_fence() only affects other atomic
>> operations, while smp_rmb() and smp_wmb() affect normal loads and stores
>> as well.
>
> That is different from what I understand what atomic_thread_fence()
> does. AFAIK, it should order all loads/stores and not only atomic
> operations.
>
> Quoting
> http://www.inf.pucrs.br/~flash/progeng2/cppreference/w/cpp/atomic/atomic_thread_fencehtml.html:
>
> "Establishes memory synchronization ordering of non-atomic and relaxed
> atomic accesses"
You can find some information at
https://bugzilla.redhat.com/show_bug.cgi?id=1142857 and
https://retrace.fedoraproject.org/faf/problems/670281/.
I've looked at private email from that time and I was pointed to this
sentence in GCC's manual, which says the opposite:
"Note that in the C++11 memory model, fences (e.g.,
‘__atomic_thread_fence’) take effect in combination with other
atomic operations on specific memory locations (e.g., atomic loads);
operations on specific memory locations do not necessarily affect
other operations in the same way."
Basically, the __atomic fences are essentially a way to break up a
non-relaxed atomic access into a relaxed atomic access and the
ordering-constraining fence. According to those emails, atomic
load-acquires and store-releases can provide synchronization for plain
loads and stores (not just for relaxed atomic loads and stores).
However, an atomic thread fence cannot do this.
>> In the GCC implementation, atomic operations (even relaxed ones) access
>> memory as if the pointer was volatile. By doing this, GCC can remove
>> the acquire and release fences altogether on TSO architectures. We
>> actually observed a case where the compiler subsequently inverted the
>> order of two writes around a smp_wmb().
>
> That is a GCC bug in that case. GCC should remove smp_wmb() only after
> the optimization passes have run in the codegen phase. Can you please
> point me to the mailing list thread/bug where this was discovered? Or
> is it easily reproducible with reverting that patch? In that case, the
> location of the bug will be helpful to analyse this further.
It depends on the compiler you're using. With some luck, reverting the
patch will cause accesses across smp_wmb() or smp_rmb() to get
reordered. In our case it was in thread-pool.c; Kevin Wolf did the
analysis.
>> In principle it could do the same on architectures that are sequentially
>> consistent; even if none exists in practice, keeping the barriers for
>> smp_mb() is consistent with the other barriers.
>
> I understand one barrier(), but having two on both sides seems
> unnecessary. I would prefer we clean up and have just one. Although, I
> think it is just an annoyance since performance wise there is no
> difference. Two consecutive barrier()'s should behave just like one.
Yes, this is okay to change.
Paolo