[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW fun

From: Benjamin Herrenschmidt
Subject: Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function
Date: Sat, 19 May 2012 08:26:32 +1000

On Fri, 2012-05-18 at 10:57 +0200, Paolo Bonzini wrote:

> > I'm still tempted to add barriers in map and unmap as well in the case
> > where they don't bounce to provide consistent semantics here, ie, all
> > accesses done between the map and unmap are ordered vs all previous and
> > subsequent accesses. Ok with that ? [...]
> > I will not add barriers to the various ld*/st* variants.
> In theory you would need a memory barrier before the first ld/st and one
> after the last... considering virtio uses map/unmap, what about leaving
> map/unmap and ld*_phys/st*_phys as the high performance unsafe API?
> Then you can add barriers around ld*_pci_dma/st*_pci_dma.

So no, my idea is to make anybody using ld_* and st_*  (non _dma)
responsible for their own barriers. The _dma are implemented in term of
cpu_physical_memory_rw so should inherit the barriers.

As for map/unmap, there's an inconsistency since when it falls back to
bounce buffering, it will get implicit barriers. My idea was to put a
barrier before always, see blow.

> >> x86 -> Load Seq_Cst:        mov                or mfence; mov
> >>        Store Seq Cst: mov; mfence        or mov
> >>
> >> ARM -> Load Seq Cst:  ldr; dmb           or dmb; ldr; dmb
> >>        Store Seq Cst: dmb; str; dmb      or dmb; str
> >>
> >> PPC -> Load Seq Cst:  sync; ld; cmp; bc; isync
> >>        Store Seq Cst: sync; st
> >>
> >>        where cmp; bc; isync can be replaced by sync.
> > 
> > Hrm, the cmp/bc/isync can be -very- expensive, we use a variant of that
> > using twi to enforce complete execution of reads in our readX()
> > accessors in the kernel but I don't think I want to do that in qemu.
> Ah, ok, thanks for explaining what cmp;bc;isync really is. :)
> > The full sync should provide all the synchronization we need
> You mean "sync; ld; sync" for load and "sync; st" for store?  That would
> do, yes.

No, just sync,ld

That should be enough. Only if the device needs additional
synchronization against the guest accessing directly map'ed device
memory should it need more and that's something the device can deal with
explicitly if ever.

IE. If I put a barrier "before" in cpu_physical_memory_rw I ensure
ordering vs all previous accesses. Anything using the low level ld/st
accessors is responsible for their own barriers. virtio for example
since they intentionally bypass the dma/iommu stuff.

As for map/unmap, the idea is to add a barrier in map() as well in the
non-bounce case (maybe not unmap, keep the "before" semantic). This
keeps the semantic of map/unmap as a "whole" being ordered which makes
sense. IE. They are "high performance" in that there is no barrier
between individual accesses within the map/unmap sequence itself which
is good, the device is responsible for that if needed, but ordering the
whole block vs. previous accesses makes sense.

That means that most users (like block devices) don't actually need to
bother, ie they use map/unmap for AIO, the barrier in map provides
synchronization with previous descriptor accesses and the barrier in
cpu_physial_memory_rw orders the transfer vs. subsequent descriptor
updates. (Assuming the transfer contains actual CPU stores which it can,
if it ends up being real DMA under the hood then it's already ordered by
the host kernel driver).

Anyway, I'll post a patch on monday my time and we can discuss further.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]