qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
Date: Thu, 9 Jan 2014 23:56:32 +0200

On Thu, Jan 09, 2014 at 12:03:26PM -0700, Alex Williamson wrote:
> On Thu, 2014-01-09 at 11:47 -0700, Alex Williamson wrote:
> > On Thu, 2014-01-09 at 20:00 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Jan 09, 2014 at 10:24:47AM -0700, Alex Williamson wrote:
> > > > On Wed, 2013-12-11 at 20:30 +0200, Michael S. Tsirkin wrote:
> > > > > From: Paolo Bonzini <address@hidden>
> > > > > 
> > > > > As an alternative to commit 818f86b (exec: limit system memory
> > > > > size, 2013-11-04) let's just make all address spaces 64-bit wide.
> > > > > This eliminates problems with phys_page_find ignoring bits above
> > > > > TARGET_PHYS_ADDR_SPACE_BITS and address_space_translate_internal
> > > > > consequently messing up the computations.
> > > > > 
> > > > > In Luiz's reported crash, at startup gdb attempts to read from address
> > > > > 0xffffffffffffffe6 to 0xffffffffffffffff inclusive.  The region it 
> > > > > gets
> > > > > is the newly introduced master abort region, which is as big as the 
> > > > > PCI
> > > > > address space (see pci_bus_init).  Due to a typo that's only 2^63-1,
> > > > > not 2^64.  But we get it anyway because phys_page_find ignores the 
> > > > > upper
> > > > > bits of the physical address.  In address_space_translate_internal 
> > > > > then
> > > > > 
> > > > >     diff = int128_sub(section->mr->size, int128_make64(addr));
> > > > >     *plen = int128_get64(int128_min(diff, int128_make64(*plen)));
> > > > > 
> > > > > diff becomes negative, and int128_get64 booms.
> > > > > 
> > > > > The size of the PCI address space region should be fixed anyway.
> > > > > 
> > > > > Reported-by: Luiz Capitulino <address@hidden>
> > > > > Signed-off-by: Paolo Bonzini <address@hidden>
> > > > > Signed-off-by: Michael S. Tsirkin <address@hidden>
> > > > > ---
> > > > >  exec.c | 8 ++------
> > > > >  1 file changed, 2 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/exec.c b/exec.c
> > > > > index 7e5ce93..f907f5f 100644
> > > > > --- a/exec.c
> > > > > +++ b/exec.c
> > > > > @@ -94,7 +94,7 @@ struct PhysPageEntry {
> > > > >  #define PHYS_MAP_NODE_NIL (((uint32_t)~0) >> 6)
> > > > >  
> > > > >  /* Size of the L2 (and L3, etc) page tables.  */
> > > > > -#define ADDR_SPACE_BITS TARGET_PHYS_ADDR_SPACE_BITS
> > > > > +#define ADDR_SPACE_BITS 64
> > > > >  
> > > > >  #define P_L2_BITS 10
> > > > >  #define P_L2_SIZE (1 << P_L2_BITS)
> > > > > @@ -1861,11 +1861,7 @@ static void memory_map_init(void)
> > > > >  {
> > > > >      system_memory = g_malloc(sizeof(*system_memory));
> > > > >  
> > > > > -    assert(ADDR_SPACE_BITS <= 64);
> > > > > -
> > > > > -    memory_region_init(system_memory, NULL, "system",
> > > > > -                       ADDR_SPACE_BITS == 64 ?
> > > > > -                       UINT64_MAX : (0x1ULL << ADDR_SPACE_BITS));
> > > > > +    memory_region_init(system_memory, NULL, "system", UINT64_MAX);
> > > > >      address_space_init(&address_space_memory, system_memory, 
> > > > > "memory");
> > > > >  
> > > > >      system_io = g_malloc(sizeof(*system_io));
> > > > 
> > > > This seems to have some unexpected consequences around sizing 64bit PCI
> > > > BARs that I'm not sure how to handle.
> > > 
> > > BARs are often disabled during sizing. Maybe you
> > > don't detect BAR being disabled?
> > 
> > See the trace below, the BARs are not disabled.  QEMU pci-core is doing
> > the sizing an memory region updates for the BARs, vfio is just a
> > pass-through here.
> 
> Sorry, not in the trace below, but yes the sizing seems to be happening
> while I/O & memory are enabled int he command register.  Thanks,
> 
> Alex

OK then from QEMU POV this BAR value is not special at all.

> > > >  After this patch I get vfio
> > > > traces like this:
> > > > 
> > > > vfio: vfio_pci_read_config(0000:01:10.0, @0x10, len=0x4) febe0004
> > > > (save lower 32bits of BAR)
> > > > vfio: vfio_pci_write_config(0000:01:10.0, @0x10, 0xffffffff, len=0x4)
> > > > (write mask to BAR)
> > > > vfio: region_del febe0000 - febe3fff
> > > > (memory region gets unmapped)
> > > > vfio: vfio_pci_read_config(0000:01:10.0, @0x10, len=0x4) ffffc004
> > > > (read size mask)
> > > > vfio: vfio_pci_write_config(0000:01:10.0, @0x10, 0xfebe0004, len=0x4)
> > > > (restore BAR)
> > > > vfio: region_add febe0000 - febe3fff [0x7fcf3654d000]
> > > > (memory region re-mapped)
> > > > vfio: vfio_pci_read_config(0000:01:10.0, @0x14, len=0x4) 0
> > > > (save upper 32bits of BAR)
> > > > vfio: vfio_pci_write_config(0000:01:10.0, @0x14, 0xffffffff, len=0x4)
> > > > (write mask to BAR)
> > > > vfio: region_del febe0000 - febe3fff
> > > > (memory region gets unmapped)
> > > > vfio: region_add fffffffffebe0000 - fffffffffebe3fff [0x7fcf3654d000]
> > > > (memory region gets re-mapped with new address)
> > > > qemu-system-x86_64: vfio_dma_map(0x7fcf38861710, 0xfffffffffebe0000, 
> > > > 0x4000, 0x7fcf3654d000) = -14 (Bad address)
> > > > (iommu barfs because it can only handle 48bit physical addresses)
> > > > 
> > > 
> > > Why are you trying to program BAR addresses for dma in the iommu?
> > 
> > Two reasons, first I can't tell the difference between RAM and MMIO.

Why can't you? Generally memory core let you find out easily.
But in this case it's vfio device itself that is sized so for sure you
know it's MMIO.
Maybe you will have same issue if there's another device with a 64 bit
bar though, like ivshmem?

> > Second, it enables peer-to-peer DMA between devices, which is something
> > that we might be able to take advantage of with GPU passthrough.
> > 
> > > > Prior to this change, there was no re-map with the fffffffffebe0000
> > > > address, presumably because it was beyond the address space of the PCI
> > > > window.  This address is clearly not in a PCI MMIO space, so why are we
> > > > allowing it to be realized in the system address space at this location?
> > > > Thanks,
> > > > 
> > > > Alex
> > > 
> > > Why do you think it is not in PCI MMIO space?
> > > True, CPU can't access this address but other pci devices can.
> > 
> > What happens on real hardware when an address like this is programmed to
> > a device?  The CPU doesn't have the physical bits to access it.  I have
> > serious doubts that another PCI device would be able to access it
> > either.  Maybe in some limited scenario where the devices are on the
> > same conventional PCI bus.  In the typical case, PCI addresses are
> > always limited by some kind of aperture, whether that's explicit in
> > bridge windows or implicit in hardware design (and perhaps made explicit
> > in ACPI).  Even if I wanted to filter these out as noise in vfio, how
> > would I do it in a way that still allows real 64bit MMIO to be
> > programmed.  PCI has this knowledge, I hope.  VFIO doesn't.  Thanks,
> > 
> > Alex
> 

AFAIK PCI doesn't have that knowledge as such. PCI spec is explicit that
full 64 bit addresses must be allowed and hardware validation
test suites normally check that it actually does work
if it happens.

Yes, if there's a bridge somewhere on the path that bridge's
windows would protect you, but pci already does this filtering:
if you see this address in the memory map this means
your virtual device is on root bus.

So I think it's the other way around: if VFIO requires specific
address ranges to be assigned to devices, it should give this
info to qemu and qemu can give this to guest.
Then anything outside that range can be ignored by VFIO.

-- 
MST




reply via email to

[Prev in Thread] Current Thread [Next in Thread]