[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW fun

From: Benjamin Herrenschmidt
Subject: Re: [Qemu-devel] [PATCH 13/13] iommu: Add a memory barrier to DMA RW function
Date: Wed, 16 May 2012 14:35:38 +1000

> > >
> > >+    /* HACK: full memory barrier here */
> > >+    __sync_synchronize();
> > 
> > I thought you were going to limit this to the TCE iommu?
> So, it wasn't my intention to send this one with the rest, but I
> forgot to explain that to Ben when he resent.  As the comment
> suggests, this is a hack we have been using internally to see if
> certain ordering problems were what we thought they were.  If that
> turned out to be the case (and it now looks like it is), we need to
> work out where to correctly place this barrier.  As Ben says, this
> should probably really be in the PCI accessors, and we should use the
> finer grained primitives from qemu-barrier.h rather than the brute
> force __sync_synchronize().

Well, I knew you didn't intend to send them but I still think that's the
right patch for now :-)

So we -could- put it in the PCI accessors ... but that would mean fixing
all drivers to actually use them. For example, ide/ahci or usb/ohci
don't and they aren't the only one.

In the end, I don't think there's anything we care about which would not
benefit from ensuring that the DMAs it does appear in the order they
were issued to the guest kernel. Most busses provide that guarantee to
some extent and while some busses do have the ability to explicitly
request relaxed ordering I don't think this is the case with anything we
care about emulating at this stage (and we can always make that separate
accessors or flags to add to the direction for example).

So by putting the barrier right in the dma_* accessor we kill all the
birds with one stone without having to audit all drivers for use of the
right accessors and all bus types.

Also while the goal of using more targeted barriers might be worthwhile
in the long run, it's not totally trivial because we do want to order
store vs. subsequent loads in all cases and load vs. loads, and we don't
want to have to keep track of what the previous access was, so at this
stage it's simply easier to just use a full barrier.

So my suggestion is to see if that patch introduces a measurable
performance regression anywhere we care about (ie on x86) and if not,
just go for it, it will solve a very real problem and we can ponder ways
to do it better as a second step if it's worthwhile.

Anthony, how do you usually benchmark these things ? Any chance you can
run a few tests to see if there's any visible loss ?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]