[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW fun

From: Benjamin Herrenschmidt
Subject: Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function
Date: Fri, 18 May 2012 18:18:56 +1000

On Fri, 2012-05-18 at 08:53 +0200, Paolo Bonzini wrote:

> It depends on what semantics you attach to dma_mb.  In my opinion,
> having a separate barrier for DMA is wrong, because you want the same
> semantics on all architectures.
> The x86 requirements are roughly as follows:
> 1) it never needs explicit rmb and wmb (as long as you don't use
> non-temporal stores etc.);
> 2) synchronized operations have an implicit mb before and after (unlike
> LL/SC on PowerPC).
> 3) everywhere else, you need an mb.
> So, on x86 you have more or less an implicit wmb after each write and an
> implicit rmb before a read.  This explains why these kind of bugs are
> very hard to see on x86 (or often impossible to see).

So what you mean is that on x86, a read can pass a write (and
vice-versa) ? Interesting.... I didn't know that.

>   Adding these in
> cpu_physical_memory_rw has the advantage that x86 performance is not
> affected, but it would not cover the case of a device model doing a DMA
> read after a DMA write.  Then the device model would need to issue a
> smp_mb manually, on all architectures.  I think this is too brittle.

Ok. Agreed.

I'll do a new patch on monday (I'm off for the week-end) that does that,
using the existing smp_mb in cpu_physical_memory_rw().

I'm still tempted to add barriers in map and unmap as well in the case
where they don't bounce to provide consistent semantics here, ie, all
accesses done between the map and unmap are ordered vs all previous and
subsequent accesses. Ok with that ?

I will not add barriers to the various ld*/st* variants.

> > If not (I'm trying to figure out why exactly does x86 have a barrier in
> > the first place and when it's in order), then I might add a new barrier
> > type in qemu-barriers.h, something like dma_mb(), and define it as a nop
> > on x86, a lwsync or sync (still thinking about it) on ppc, and
> > __sync_synchronize() on unknown archs.
> I don't think it is correct to think of this in terms of low-level
> operations such as sync/lwsync.  Rather, I think what we want is
> sequentially consistent accesses; it's heavyweight, but you cannot go
> wrong.  http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html says this
> is how you do it:
> x86 -> Load Seq_Cst:  mov                or mfence; mov
>        Store Seq Cst: mov; mfence        or mov
> ARM -> Load Seq Cst:  ldr; dmb           or dmb; ldr; dmb
>        Store Seq Cst:         dmb; str; dmb      or dmb; str
> PPC -> Load Seq Cst:  sync; ld; cmp; bc; isync
>        Store Seq Cst:         sync; st
>        where cmp; bc; isync can be replaced by sync.

Hrm, the cmp/bc/isync can be -very- expensive, we use a variant of that
using twi to enforce complete execution of reads in our readX()
accessors in the kernel but I don't think I want to do that in qemu.

The full sync should provide all the synchronization we need, the read
trick is really only meant in the kernel to enforce timings (ie, a read
followed by a delay, allows to make sure that the delay only starts
"counting" after the read has completed, this is useful when talking to
real HW which might have specific timing requirements).
> and says "As far as the memory model is concerned, the ARM processor is
> broadly similar to PowerPC, differing mainly in having a DMB barrier
> (analogous to the PowerPC sync in its programmer-observable behaviour
> for normal memory) and no analogue of the PowerPC lwsync".
> So one of the two ARM mappings, with smp_mb instead of dmb, is what we
> want in cpu_physical_memory_rw.  Device models that want to do better
> can just use cpu_physical_memory_map, or we can add a
> cpu_physical_memory_rw_direct for them.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]