qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Supporting emulation of IOMMUs


From: Eduard - Gabriel Munteanu
Subject: Re: [Qemu-devel] Supporting emulation of IOMMUs
Date: Thu, 21 Apr 2011 21:47:31 +0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Apr 21, 2011 at 05:03:47PM +1000, David Gibson wrote:
> A few months ago, Eduard - Gabriel Munteanu posted a series of patches
> implementing support for emulating the AMD PCI IOMMU
> (http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html).
> 
> In fact, this series implemented a general DMA/IOMMU layer which can
> be used by any device model, and one translation backend for this
> implementing the AMD specific PCI IOMMU.
> 
> These patches don't seem to have gone anywhere for the last few
> months, however, and so far I've been unable to contact the author
> (trying again with this mail).
> 
> I have an interest in this code, because the pSeries machine will also
> need IOMMU emulation support.  At present we only support virtual
> devices, through the PAPR interface, and we have support for the
> hypervisor-controller IOMMU translation in the PAPR VIO code.
> However, we want to add PCI device support and this will also need
> IOMMU translation.
> 
> The series seems to have the right basic approach, so if the author is
> indeed MIA, I was planning to pick up the patches and resubmit them
> (with support for the pSeries IOMMU added).

Hi,

Not really MIA, but I've been a bit busy lately, so I'm sorry if I
couldn't answer your mail in a timely fashion.

I'll try making another merge attempt tonight/tomorrow.

> Before I do that, I was hoping to get some consensus that this is the
> right way to go.  For reference, I have an updated version of the
> first patch (which adds the core IOMMU layer) below.

Some developers expressed a few concerns during my last merge attempt,
I'm going to go through them and see if they have been solved.

> From: Eduard - Gabriel Munteanu <address@hidden>
> Date: Fri, 4 Feb 2011 01:32:55 +0200
> Subject: [PATCH] Generic DMA memory access interface
> 
> This introduces replacements for memory access functions like
> cpu_physical_memory_read(). The new interface can handle address
> translation and access checking through an IOMMU.
> 
> David Gibson: I have made several bugfixes and cleanups to Eduard's
> original patch.
> 
>  * dma_memory_rw() was incorrectly using (uninitialized) plen instead
>    of len in the fallback to no-IOMMU case.
> 
>  * the dma_memory_map() tracking was storing the guest physical
>    address of each mapping, but not the qemu user virtual address.
>    However in unmap() it was then attempting to lookup by virtual
>    using a completely bogus cast.

Thanks. Map invalidation didn't get much testing, maybe figuring out a
way to trigger it from a guest would be nice, say a testcase.

>  * The dma_memory_rw() function is moved from dma_rw.h to dma_rw.c, it
>    was a bit too much code for an inline.
> 
>  * IOMMU support is now available on all target platforms, not just
>    i386, but is configurable (--enable-iommu/--disable-iommu).  Stubs
>    are used so that individual drivers can use the new dma interface
>    and it will turn into old-style cpu physical accesses at no cost on
>    IOMMU-less builds.

My impression was people were in favor of having the IOMMU code always
built in (and go through the direct cpu_physical_* when not configured
by the guest). And perhaps poison the old interfaces once everything
goes through the new DMA layer. I'm okay any way, though.

> Signed-off-by: Eduard - Gabriel Munteanu <address@hidden>
> Signed-off-by: David Gibson <address@hidden>
> ---
>  Makefile.target |    1 +
>  configure       |   12 ++++
>  hw/dma_rw.c     |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/dma_rw.h     |  156 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 316 insertions(+), 0 deletions(-)
>  create mode 100644 hw/dma_rw.c
>  create mode 100644 hw/dma_rw.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index 95f5eda..c3d36c6 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -198,6 +198,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += virtio-9p.o
>  obj-y += rwhandler.o
>  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
> +obj-$(CONFIG_IOMMU) += dma_rw.o
>  LIBS+=-lz
>  
>  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
> diff --git a/configure b/configure
> index da2da04..fa6f4d5 100755
> --- a/configure
> +++ b/configure
> @@ -130,6 +130,7 @@ xen=""
>  linux_aio=""
>  attr=""
>  vhost_net=""
> +iommu="no"
>  xfs=""
>  
>  gprof="no"
> @@ -723,6 +724,10 @@ for opt do
>    ;;
>    --enable-vhost-net) vhost_net="yes"
>    ;;
> +  --enable-iommu) iommu="yes"
> +  ;;
> +  --disable-iommu) iommu="no"
> +  ;;
>    --disable-opengl) opengl="no"
>    ;;
>    --enable-opengl) opengl="yes"
> @@ -934,6 +939,8 @@ echo "  --enable-docs            enable documentation 
> build"
>  echo "  --disable-docs           disable documentation build"
>  echo "  --disable-vhost-net      disable vhost-net acceleration support"
>  echo "  --enable-vhost-net       enable vhost-net acceleration support"
> +echo "  --disable-iommu          disable IOMMU emulation support"
> +echo "  --enable-vhost-net       enable IOMMU emulation support"
>  echo "  --enable-trace-backend=B Set trace backend"
>  echo "                           Available backends:" 
> $("$source_path"/scripts/tracetool --list-backends)
>  echo "  --with-trace-file=NAME   Full PATH,NAME of file to store traces"
> @@ -2608,6 +2615,7 @@ echo "madvise           $madvise"
>  echo "posix_madvise     $posix_madvise"
>  echo "uuid support      $uuid"
>  echo "vhost-net support $vhost_net"
> +echo "IOMMU support     $iommu"
>  echo "Trace backend     $trace_backend"
>  echo "Trace output file $trace_file-<pid>"
>  echo "spice support     $spice"
> @@ -3412,6 +3420,10 @@ if test "$target_softmmu" = "yes" -a \( \
>    echo "CONFIG_NEED_MMU=y" >> $config_target_mak
>  fi
>  
> +if test "$iommu" = "yes" ; then
> +  echo "CONFIG_IOMMU=y" >> $config_target_mak
> +fi
> +
>  if test "$gprof" = "yes" ; then
>    echo "TARGET_GPROF=yes" >> $config_target_mak
>    if test "$target_linux_user" = "yes" ; then
> diff --git a/hw/dma_rw.c b/hw/dma_rw.c
> new file mode 100644
> index 0000000..627835c
> --- /dev/null
> +++ b/hw/dma_rw.c
> @@ -0,0 +1,147 @@
> +/*
> + * Generic DMA memory access interface.
> + *
> + * Copyright (c) 2011 Eduard - Gabriel Munteanu
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "dma_rw.h"
> +#include "range.h"
> +
> +void dma_memory_rw(DMADevice *dev, dma_addr_t addr,
> +                   void *buf, dma_addr_t len, int is_write)
> +{
> +    dma_addr_t paddr, plen;
> +    int err;
> +
> +    /*
> +     * Fast-path non-iommu.
> +     * More importantly, makes it obvious what this function does.
> +     */
> +    if (NO_IOMMU(dev)) {
> +        cpu_physical_memory_rw(addr, buf, len, is_write);
> +        return;
> +    }
> +
> +    while (len) {
> +        err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +        if (err) {
> +            return;
> +        }
> +
> +        /* The translation might be valid for larger regions. */
> +        if (plen > len) {
> +            plen = len;
> +        }
> +
> +        cpu_physical_memory_rw(paddr, buf, plen, is_write);
> +
> +        len -= plen;
> +        addr += plen;
> +        buf += plen;
> +    }
> +}
> +
> +typedef struct DMAMemoryMap DMAMemoryMap;
> +struct DMAMemoryMap {
> +    dma_addr_t              addr;
> +    size_t                  len;
> +    void *                  buf;
> +    DMAInvalidateMapFunc    *invalidate;
> +    void                    *invalidate_opaque;
> +
> +    QLIST_ENTRY(DMAMemoryMap) list;
> +};
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr, dma_addr_t len)
> +{
> +    DMAMemoryMap *map;
> +
> +    QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +        if (ranges_overlap(addr, len, map->addr, map->len)) {
> +            map->invalidate(map->invalidate_opaque);
> +            QLIST_REMOVE(map, list);
> +            free(map);
> +        }
> +    }
> +}
> +
> +void *dma_memory_map(DMADevice *dev, DMAInvalidateMapFunc *cb, void *opaque,
> +                     dma_addr_t addr, dma_addr_t *len, int is_write)
> +{
> +    int err;
> +    target_phys_addr_t paddr, plen;
> +    void *buf;
> +
> +    if (NO_IOMMU(dev)) {
> +        return cpu_physical_memory_map(addr, len, is_write);
> +    }
> +
> +    plen = *len;
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write);
> +    if (err) {
> +        return NULL;
> +    }
> +
> +    /*
> +     * If this is true, the virtual region is contiguous,
> +     * but the translated physical region isn't. We just
> +     * clamp *len, much like cpu_physical_memory_map() does.
> +     */
> +    if (plen < *len) {
> +        *len = plen;
> +    }
> +
> +    buf = cpu_physical_memory_map(paddr, len, is_write);
> +
> +    /* We treat maps as remote TLBs to cope with stuff like AIO. */
> +    if (cb) {
> +        DMAMemoryMap *map;
> +
> +        map = qemu_malloc(sizeof(DMAMemoryMap));
> +        map->addr = addr;
> +        map->len = *len;
> +        map->buf = buf;
> +        map->invalidate = cb;
> +        map->invalidate_opaque = opaque;
> +        
> +        QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list);
> +    }
> +
> +    return buf;
> +}
> +
> +void dma_memory_unmap(DMADevice *dev, void *buffer, dma_addr_t len,
> +                      int is_write, dma_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +    if (!NO_IOMMU(dev)) {
> +        DMAMemoryMap *map;
> +
> +        QLIST_FOREACH(map, &dev->mmu->memory_maps, list) {
> +            if ((map->buf == buffer) && (map->len == len)) {
> +                QLIST_REMOVE(map, list);
> +                free(map);
> +            }
> +        }
> +    }
> +}
> +
> diff --git a/hw/dma_rw.h b/hw/dma_rw.h
> new file mode 100644
> index 0000000..58a3d0f
> --- /dev/null
> +++ b/hw/dma_rw.h
> @@ -0,0 +1,156 @@
> +#ifndef DMA_RW_H
> +#define DMA_RW_H
> +
> +#include "qemu-common.h"
> +
> +typedef uint64_t dma_addr_t;
> +
> +typedef struct DMAMmu DMAMmu;
> +typedef struct DMADevice DMADevice;
> +
> +typedef int DMATranslateFunc(DMADevice *dev,
> +                             dma_addr_t addr,
> +                             dma_addr_t *paddr,
> +                             dma_addr_t *len,
> +                             int is_write);
> +
> +typedef void DMAInvalidateMapFunc(void *);
> +
> +#ifndef CONFIG_IOMMU
> +struct DMAMmu {
> +};
> +
> +struct DMADevice {
> +};
> +
> +#define NO_IOMMU(_dev) (1)
> +
> +static inline void dma_memory_rw(DMADevice *dev, dma_addr_t addr,
> +                                 void *buf, dma_addr_t len, int is_write)
> +{
> +    cpu_physical_memory_rw(addr, buf, len, is_write);
> +}
> +
> +static inline void *dma_memory_map(DMADevice *dev,
> +                                   DMAInvalidateMapFunc *cb, void *opaque,
> +                                   dma_addr_t addr, dma_addr_t *len,
> +                                   int is_write)
> +{
> +    return cpu_physical_memory_map(addr, len, is_write);
> +}
> +
> +static inline void dma_memory_unmap(DMADevice *dev,
> +                                    void *buffer, dma_addr_t len,
> +                                    int is_write, dma_addr_t access_len)
> +{
> +    cpu_physical_memory_unmap(buffer, len, is_write, access_len);
> +}
> +
> +#define DEFINE_DMA_LD(suffix, size)                                       \
> +static inline uint##size##_t                                              \
> +dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
> +{                                                                         \
> +    return ld##suffix##_phys(addr);                                       \
> +}
> +
> +#define DEFINE_DMA_ST(suffix, size)                                       \
> +static inline void                                                        \
> +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
> +{                                                                         \
> +    st##suffix##_phys(addr, val);                                         \
> +}
> +
> +#else
> +struct DMAMmu {
> +    DeviceState *iommu;
> +    DMATranslateFunc *translate;
> +    QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps;
> +};
> +
> +struct DMADevice {
> +    DMAMmu *mmu;
> +};
> +
> +#define NO_IOMMU(_dev) (!(_dev) || !(_dev)->mmu)
> +
> +void dma_memory_rw(DMADevice *dev, dma_addr_t addr,
> +                   void *buf, dma_addr_t len, int is_write);
> +
> +void *dma_memory_map(DMADevice *dev,
> +                     DMAInvalidateMapFunc *cb, void *opaque,
> +                     dma_addr_t addr, dma_addr_t *len,
> +                     int is_write);
> +void dma_memory_unmap(DMADevice *dev,
> +                      void *buffer, dma_addr_t len,
> +                      int is_write, dma_addr_t access_len);
> +
> +void dma_invalidate_memory_range(DMADevice *dev,
> +                                 dma_addr_t addr, dma_addr_t len);
> +
> +/* warning: like the corresponding ldX_phys / stX_phys functions, these
> + * DMA accessors can only handle aligned accesses */
> +
> +#define DEFINE_DMA_LD(suffix, size)                                       \
> +static inline uint##size##_t                                              \
> +dma_ld##suffix(DMADevice *dev, dma_addr_t addr)                           \
> +{                                                                         \
> +    int err;                                                              \
> +    dma_addr_t paddr, plen;                                               \
> +                                                                          \
> +    if (NO_IOMMU(dev)) {                                                  \
> +        return ld##suffix##_phys(addr);                                   \
> +    }                                                                     \
> +                                                                          \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 0);               \
> +    if (err || (plen < size / 8))                                         \
> +        return 0;                                                         \
> +                                                                          \
> +    return ld##suffix##_phys(paddr);                                      \
> +}
> +
> +#define DEFINE_DMA_ST(suffix, size)                                       \
> +static inline void                                                        \
> +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val)       \
> +{                                                                         \
> +    int err;                                                              \
> +    target_phys_addr_t paddr, plen;                                       \
> +                                                                          \
> +    if (NO_IOMMU(dev)) {                                                  \
> +        st##suffix##_phys(addr, val);                                     \
> +        return;                                                           \
> +    }                                                                     \
> +    err = dev->mmu->translate(dev, addr, &paddr, &plen, 1);               \
> +    if (err || (plen < size / 8))                                         \
> +        return;                                                           \
> +                                                                          \
> +    st##suffix##_phys(paddr, val);                                        \
> +}
> +#endif /* CONFIG_IOMMU */
> +
> +DEFINE_DMA_LD(ub, 8)
> +DEFINE_DMA_LD(uw, 16)
> +DEFINE_DMA_LD(l, 32)
> +DEFINE_DMA_LD(q, 64)
> +
> +DEFINE_DMA_ST(b, 8)
> +DEFINE_DMA_ST(w, 16)
> +DEFINE_DMA_ST(l, 32)
> +DEFINE_DMA_ST(q, 64)
> +
> +static inline void dma_memory_read(DMADevice *dev,
> +                                   dma_addr_t addr,
> +                                   void *buf,
> +                                   dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, buf, len, 0);
> +}
> +
> +static inline void dma_memory_write(DMADevice *dev,
> +                                    dma_addr_t addr,
> +                                    const void *buf,
> +                                    dma_addr_t len)
> +{
> +    dma_memory_rw(dev, addr, (void *) buf, len, 1);
> +}
> +
> +#endif /* DMA_RW_H */
> -- 
> 1.7.1
> 
> 
> -- 
> David Gibson                  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au        | minimalist, thank you.  NOT _the_ 
> _other_
>                               | _way_ _around_!
> http://www.ozlabs.org/~dgibson



reply via email to

[Prev in Thread] Current Thread [Next in Thread]