[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] QOMification of AXI streams

From: Benjamin Herrenschmidt
Subject: Re: [Qemu-devel] [RFC] QOMification of AXI streams
Date: Wed, 13 Jun 2012 10:37:41 +1000

On Tue, 2012-06-12 at 12:46 +0300, Avi Kivity wrote:
> > I think that transformation function lives in the bus layer
> > MemoryRegion.  It's a bit tricky though because you need some sort of
> > notion of "who is asking".  So you need:
> > 
> > dma_memory_write(MemoryRegion *parent, DeviceState *caller,
> >                  const void *data, size_t size);
> It is not the parent here, but rather the root of the memory hierarchy
> as viewed from the device (the enigmatically named 'pcibm' above).  The
> pci memory region simply doesn't have the information about where system
> memory lives, because it is a sibling region.

Right and it has to be hierarchical, you can have CPU -> PCI transform
followed by PCI -> AXI (or whatever stupid bus they use on the Broadcom
wireless cards), etc...

There can be any amount of transform. There's also the need at each
level to handle sibling decoding. IE. It's the same BARs used for
downstream and upstream access that will decode an access at any given

So it's not a separate hierarchy, it's the same hierarchy walked both
ways with potentially different transforms depending on what direction
it's walked .

> Note that the address transformations are not necessarily symmetric (for
> example, iommus transform device->system transactions, but not
> cpu->device transactions).  Each initiator has a separate DAG to follow.

Right. Or rather they might transform CPU -> device but differently (ie,
we do have several windows with different offsets on power for example
etc...) so it's a different transform which -might- be an iommu of some
sort as well.

I think the whole mechanism should be symetrical, with a fast path for
transforms that can be represented by a direct map + offset (ie no iommu

> > This could be simplified at each layer via:
> > 
> > void pci_device_write(PCIDevice *dev, const void *data, size_t size) {
> >     dma_memory_write(dev->bus->mr, DEVICE(dev), data, size);
> > }
> > 
> >> To be true to the HW, each bridge should have its memory region, so a
> >> setup with
> >>
> >>        /pci-host
> >>            |
> >>            |--/p2p
> >>                 |
> >>            |--/device
> >>
> >> Any DMA done by the device would walk through the p2p region to the host
> >> which would contain a region with transform ops.
> >>
> >> However, at each level, you'd have to search for sibling regions that
> >> may decode the address at that level before moving up, ie implement
> >> essentially the equivalent of the PCI substractive decoding scheme.
> > 
> > Not quite...  subtractive decoding only happens for very specific
> > devices IIUC.  For instance, an PCI-ISA bridge.  Normally, it's positive
> > decoding and a bridge has to describe the full region of MMIO/PIO that
> > it handles.
> > 
> > So it's only necessary to transverse down the tree again for the very
> > special case of PCI-ISA bridges.  Normally you can tell just by looking
> > at siblings.
> > 
> >> That will be a significant overhead for your DMA ops I believe, though
> >> doable.
> > 
> > Worst case scenario, 256 devices with what, a 3 level deep hierarchy? 
> > we're still talking about 24 simple address compares.  That shouldn't be
> > so bad.
> Or just lookup the device-local phys_map.

> > 
> >> Then we'd have to add map/unmap to MemoryRegion as well, with the
> >> understanding that they may not be supported at every level...
> > 
> > map/unmap can always fall back to bounce buffers.
> > 
> >> So yeah, it sounds doable and it would handle what DMAContext doesn't
> >> handle which is access to peer devices without going all the way back to
> >> the "top level", but it's complex and ... I need something in qemu
> >> 1.2 :-)
> > 
> > I think we need a longer term vision here.  We can find incremental
> > solutions for the short term but I'm pretty nervous about having two
> > parallel APIs only to discover that we need to converge in 2 years.
> The API already exists, we just need to fill up the data structures.

Not really no, we don't have proper DMA APIs to shoot from devices.

What the DMAContext patches provide is a generic dma_* API but if we are
going to get rid of DMAContext in favor of a (modified ?) MemoryRegion
I'd rather not expose that to devices.

Since I need something _now_ for 1.2 (this has been going on for way too
long), I'm going to go for a quick kill for PCI & PAPR VIO only using a
slightly modified version of the existing iommu patches that provides
pci_* wrappers to the DMA ops that take the PCIDevice as an argument.

That way we can replace the infrastructure and remove DMAContext without
affecting devices in a second stage (unless you think you can come up
with a new scheme in the next few days :-) I really am not familiar
enough with those parts of qemu to aim for the full schebang for 1.2 but
maybe you guys can :-)


> -- 
> error compiling committee.c: too many arguments to function

reply via email to

[Prev in Thread] Current Thread [Next in Thread]