qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.


From: Gleb Natapov
Subject: [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
Date: Mon, 12 Oct 2009 10:48:58 +0200

On Mon, Oct 12, 2009 at 10:13:14AM +0200, Michael S. Tsirkin wrote:
> > > > > 
> > > > > This wastes memory for non-assigned devices.  I think it's better, and
> > > > > cleaner, to make qemu increase the BAR size up to 4K for assigned
> > > > > devices if it wants page size alignment.
> > > > > 
> > > > We have three and a half devices in QEUM so I don't think memory is a
> > > > big concern. Regardless, if you think that fiddle with assigned devices
> > > > responses is better idea go ahead and send patches.
> > > 
> > > Even if you fiddle with BIOS, guest is allowed to reassign BARs,
> > > breaking your assumptions.
> > Good point. So the fact that this patched helped its creator shows that
> > linux doesn't do this.
> 
> Try hot-plugging the device instead of have it present on boot.
> Patching BIOS won't help then, will it?  So my question is, if we need
> to handle this in qemu, is it worth it to do it in kvm as well?
> 
It depend how linux assign mmio address to hot pluggable devices. How
can you be sure a device driver continue working if you'll misrepresent
BAR size BTW?

> > > > As it stands this
> > > > patch is in kvm's bios and is required for assigned devices to work
> > > > for some devices, so moving to seabios without this patch will introduce
> > > > a regression.
> > > 
> > > I have a question here: if kvm maps a full physical page
> > > into guest memory, while device only uses part of the page,
> > > won't that mean that guest is granted access outside the
> > > device, which it should not have?
> > And how is real HW different? It maps a full physical page into OS
> > memory even if BAR is smaller then page and grants OS access to
> > unassigned mmio region. Access unassigned mmio region shouldn't cause
> > any trouble, doesn't it?
> 
> Unassigned - typically no, but there can be another device there, or a RAM
> page.  It is different on real hardware where OS has access to all RAM and all
> devices, anyway.
> 
> Here's an example from my laptop:
> 
> 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset 
> MEI Controller (rev 07)
>         Subsystem: Lenovo Device 20e6
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR- INTx+
>         Latency: 0
>         Interrupt: pin A routed to IRQ 11
>         Region 0: Memory at fc226800 (64-bit, non-prefetchable) [size=16]
>         Capabilities: <access denied>
> 
> ...
> 
> 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller 
> (rev 03) (prog-if 01 [AHCI 1.0])
>         Subsystem: Lenovo Device 20f8
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin B routed to IRQ 28
>         Region 0: I/O ports at 1c48 [size=8]
>         Region 1: I/O ports at 183c [size=4]
>         Region 2: I/O ports at 1c40 [size=8]
>         Region 3: I/O ports at 1838 [size=4]
>         Region 4: I/O ports at 1c20 [size=32]
>         Region 5: Memory at fc226000 (32-bit, non-prefetchable) [size=2K]
>         Capabilities: <access denied>
>         Kernel driver in use: ahci
> 
> In this setup, if you assign a page at address fc226000, for SATA,
> I think that guest will be able to control Communication controller as well.
Who configures BARs for assigned device guest or host? If host you can't
safely passthrough one of those devices. But passthrough is not secure
anyway since guest can DMA all over host memory.

> 
> > > Maybe the solution is to disable bypass for sub-page BARs and to
> > > handle them in qemu, where we don't have alignment restrictions?
> > > 
> > Making fast path go through qemu for assigned devices? May be remove 
> > this pass through crap from kvm to save us all from this misery then? 
> 
> Another option is for KVM to check these scenarious and deny assignment if
> there's such an overlap.
One more constrain for device assignment. Simple real life scenarios
don't work for our users as it is. Adding more constrains will not help.

> 
> > > > > 
> > > > > > ---
> > > > > >  src/pciinit.c |    7 +++++++
> > > > > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > > > > 
> > > > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > > > index 29b3901..53fbfcf 100644
> > > > > > --- a/src/pciinit.c
> > > > > > +++ b/src/pciinit.c
> > > > > > @@ -10,6 +10,7 @@
> > > > > >  #include "biosvar.h" // GET_EBDA
> > > > > >  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
> > > > > >  #include "pci_regs.h" // PCI_COMMAND
> > > > > > +#include "paravirt.h"
> > > > > >  
> > > > > >  #define PCI_ROM_SLOT 6
> > > > > >  #define PCI_NUM_REGIONS 7
> > > > > > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
> > > > > >                  *paddr = ALIGN(*paddr, size);
> > > > > >                  pci_set_io_region_addr(bdf, i, *paddr);
> > > > > >                  *paddr += size;
> > > > > > +                if (kvm_para_available()) {
> > > > > > +                    /* make memory address page aligned */
> > > > > > +                    /* needed for device assignment on kvm */
> > > > > > +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> > > > > > +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> > > > > > +               }
> > > > > >              }
> > > > > >          }
> > > > > >          break;
> > > > > > -- 
> > > > > > 1.6.3.3
> > > > > > 
> > > > > > 
> > > > 
> > > > --
> > > >                         Gleb.
> > 
> > --
> >                     Gleb.

--
                        Gleb.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]