qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
Date: Tue, 9 Feb 2016 11:46:08 +0100

On Tue, 2 Feb 2016 13:16:38 +0200
"Michael S. Tsirkin" <address@hidden> wrote:

> On Tue, Feb 02, 2016 at 10:59:53AM +0100, Igor Mammedov wrote:
> > On Sun, 31 Jan 2016 18:22:13 +0200
> > "Michael S. Tsirkin" <address@hidden> wrote:
> >   
> > > On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:  
> > > > On Thu, 28 Jan 2016 14:59:25 +0200
> > > > "Michael S. Tsirkin" <address@hidden> wrote:
> > > >     
> > > > > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:    
> > > > > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > > > > "Michael S. Tsirkin" <address@hidden> wrote:
> > > > > >       
> > > > > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:    
> > > > > > >   
> > > > > > > > Based on Microsoft's specifications (paper can be
> > > > > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > > > > add a PCI device with corresponding description in
> > > > > > > > SSDT ACPI table.
> > > > > > > > 
> > > > > > > > The GUID is set using "vmgenid.guid" property or
> > > > > > > > a corresponding HMP/QMP command.
> > > > > > > > 
> > > > > > > > Example of using vmgenid device:
> > > > > > > >  -device 
> > > > > > > > vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > > > > 
> > > > > > > > 'vmgenid' device initialization flow is as following:
> > > > > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > > > >     are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > > > >     GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > > > > 
> > > > > > > > Note:
> > > > > > > > This implementation uses PCI class 0x0500 code for vmgenid 
> > > > > > > > device,
> > > > > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > > > > Testing various Windows versions showed that, OS
> > > > > > > > doesn't touch nor checks for resource conflicts
> > > > > > > > for such PCI devices.
> > > > > > > > There was concern that during PCI rebalancing, OS
> > > > > > > > could reprogram the BAR at other place, which would
> > > > > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > > > > address.
> > > > > > > > However testing showed that Windows does rebalancing
> > > > > > > > only for PCI device that have a driver attached
> > > > > > > > and completely ignores NO_DRV class of devices.
> > > > > > > > Which in turn creates a problem where OS could remap
> > > > > > > > one of PCI devices(with driver) over BAR used by
> > > > > > > > a driver-less PCI device.
> > > > > > > > Statically declaring used memory range as VGEN._CRS
> > > > > > > > makes OS to honor resource reservation and an ignored
> > > > > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > > > > 
> > > > > > > > Signed-off-by: Gal Hammer <address@hidden>
> > > > > > > > Signed-off-by: Igor Mammedov <address@hidden>        
> > > > > > > 
> > > > > > > It's an interesting hack, but this needs some thought. BIOS has 
> > > > > > > no idea
> > > > > > > this BAR is special and can not be rebalanced, so it might put 
> > > > > > > the BAR
> > > > > > > in the middle of the range, in effect fragmenting it.      
> > > > > > yep that's the only drawback in PCI approach.
> > > > > >       
> > > > > > > Really I think something like V12 just rewritten using the new 
> > > > > > > APIs
> > > > > > > (probably with something like build_append_named_dword that I 
> > > > > > > suggested)
> > > > > > > would be much a simpler way to implement this device, given
> > > > > > > the weird API limitations.      
> > > > > > We went over stating drawbacks of both approaches several times 
> > > > > > and that's where I strongly disagree with using v12 AML patching
> > > > > > approach for reasons stated in those discussions.      
> > > > > 
> > > > > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > > > > to host, and to have costom code to migrate the address.    
> > > > allocating IO ports is fine by me but I'm against using bios_linker 
> > > > (ACPI)
> > > > approach for task at hand,
> > > > let me enumerate one more time the issues that make me dislike it so 
> > > > much
> > > > (in order where most disliked ones go the first):
> > > > 
> > > > 1. over-engineered for the task at hand, 
> > > >    for device to become initialized guest OS has to execute AML,
> > > >    so init chain looks like:
> > > >      QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) 
> > > > ->
> > > >          QEMU (update buf address)
> > > >    it's hell to debug when something doesn't work right in this chain   
> > > >  
> > > 
> > > Well this is not very different from e.g. virtio.
> > > If it's just AML that worries you, we could teach BIOS/EFI a new command
> > > to give some addresses after linking back to QEMU. Would this address
> > > this issue?  
> > it would make it marginally better (especially from tests pov)
> > though it won't fix other issues.
> >   
> > > 
> > >   
> > > >    even if there isn't any memory corruption that incorrect AML patching
> > > >    could introduce.
> > > >    As result of complexity patches are hard to review since one has
> > > >    to remember/relearn all details how bios_linker in QEMU and BIOS 
> > > > works,
> > > >    hence chance of regression is very high.
> > > >    Dynamically patched AML also introduces its own share of AML
> > > >    code that has to deal with dynamic buff address value.
> > > >    For an example:
> > > >      "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
> > > >    27 liner patch could be just 5-6 lines if static (known in advance)
> > > >    buffer address were used to declare static _CRS variable.    
> > > 
> > > Problem is with finding a fixed address, and fragmentation that this
> > > causes.  Look at the mess we have with just allocating addresses for
> > > RAM.  I think it's a mistake to add to this mess.  Either let's teach
> > > management to specify an address map, or let guest allocate addresses
> > > for us.  
> > 
> > Yep, problem here is in 'fixed' part but not in an address in general.
> > Allowing Mgmt to specify address map partially works. See pc-dimm,
> > on target side libvirt specifies address where it has been mapped
> > on source. Initial RAM mess could be fixed for future machines types
> > in similar way by replacing memory_region_allocate_system_memory()
> > with pc-dimms, so that specific machine type could reproduce
> > the same layout.  
> 
> But this requires management to specify the complete memory map.
for clean start up I'd say it's impossible for mgmt to define
complete or even partial memory map, it just doesn't have
knowledge about machine internals. So GPAs should be allocated
by QEMU (currently it's mostly handpicked fixed ones)

> Otherwise we won't be able to change the layout, ever,
> even when we change machine types.
we won't be able to change layout for old machines but if
we make layout configurable then new machines and probably
with auto allocated GPAs then new machines could be more flexible
wrt layout (but that just vague thoughts for now, maybe some day
I'll try to implement it)

> Migration is different as addresses are queried on source.
> 
> Hmm, I did not realize someone might misuse this and
> set address manually even without migration.
> We need to find a way to prevent that before it's too late.
> Eduardo - any ideas?
it's late and it doesn't really matter as pc-dimm.addr is
confined inside hotplug range.


> > But default addresses doesn't appear magically and have to
> > come from somewhere, so we have to have an address allocator
> > somewhere.  
> 
> I feel for advanced functionality like vm gen id,
> we could require all addresses to be specified.
Any suggestion how would mgmt pick this address?


> > If we put allocator into guest and emulate memory controller
> > in QEMU, we probably would need to add fw_cfg interfaces
> > that describe hardware which needs mapping (probably ever
> > growing interface like it has been with ACPI tables, before
> > we axed them in BIOS and moved them into QEMU).
> > Alternatively we can put allocator in QEMU which could
> > be simpler to implement and maintain since we won't need
> > to implement extra fw_cfg interfaces and memory controller
> > and ship/fix QEMU/BIOS pair in sync as it has been with ACPI
> > in past.  
> 
> So the linker interface solves this rather neatly:
> bios allocates memory, bios passes memory map to guest.
> Served us well for several years without need for extensions,
> and it does solve the VM GEN ID problem, even though
> 1. it was never designed for huge areas like nvdimm seems to want to use
> 2. we might want to add a new 64 bit flag to avoid touching low memory
linker interface is fine for some readonly data, like ACPI tables
especially fixed tables not so for AML ones is one wants to patch it.

However now when you want to use it for other purposes you start
adding extensions and other guest->QEMU channels to communicate
patching info back.
It steals guest's memory which is also not nice and doesn't scale well.

> 
> 
> > > > 2. ACPI approach consumes guest usable RAM to allocate buffer
> > > >    and then makes device to DMA data in that RAM.
> > > >    That's a design point I don't agree with.    
> > > 
> > > Blame the broken VM GEN ID spec.
> > > 
> > > For VM GEN ID, instead of DMA we could make ACPI code read PCI BAR and
> > > copy data over, this would fix rebalancing but there is a problem with
> > > this approach: it can not be done atomically (while VM is not yet
> > > running and accessing RAM).  So you can have guest read a partially
> > > corrupted ID from memory.  
> > 
> > Yep, VM GEN ID spec is broken and we can't do anything about it as
> > it's absolutely impossible to guaranty atomic update as guest OS
> > has address of the buffer and can read it any time. Nothing could be
> > done here.  
> 
> Hmm I thought we can stop VM while we are changing the ID.
> But of course VM could be accessing it at the same time.
> So I take this back, ACPI code reading PCI BAR and
> writing data out to the buffer would be fine
> from this point of view.
spec is broken regardless of who reads BAR or RAM as read
isn't atomic and UUID could be updated in the middle.
So MS will have to live with that, unless they have a secret
way to tell guest stop reading from that ADDR, do update and
signal guest that it can read it.

> 
> > > 
> > > And hey, nowdays we actually made fw cfg do DMA too.  
> > 
> > I'm not against DMA, it's direction which seems wrong to me.
> > What we need here is a way to allocate GPA range and make sure
> > that QEMU maps device memory there, PCI BAR is one
> > of the ways to do it.  
> 
> OK fine, but returning PCI BAR address to guest is wrong.
> How about reading it from ACPI then? Is it really
> broken unless there's *also* a driver?
I don't get question, MS Spec requires address (ADDR method),
and it's read by ACPI (AML).
As for working PCI_Config OpRegion without driver, I haven't tried,
but I wouldn't be surprised if it doesn't, taking in account that
MS introduced _DSM doesn't.

> 
> 
> > > >    Just compare with a graphics card design, where on device memory
> > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > >    use for other tasks.    
> > > 
> > > This might have been true 20 years ago.  Most modern cards do DMA.  
> > 
> > Modern cards, with it's own RAM, map its VRAM in address space directly
> > and allow users use it (GEM API). So they do not waste conventional RAM.
> > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > series (even PCI class id is the same)  
> 
> Don't know enough about graphics really, I'm not sure how these are
> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> mostly use guest RAM, not on card RAM.
> 
> > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > >    instead of consuming guest's RAM they should be mapped at
> > > >    some GPA and their memory accessed directly.    
> > > 
> > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > address. This breaks the straight-forward approach of using a
> > > rebalanceable PCI BAR.  
> > 
> > For PCI rebalance to work on Windows, one has to provide working PCI driver
> > otherwise OS will ignore it when rebalancing happens and
> > might map something else over ignored BAR.  
> 
> Does it disable the BAR then? Or just move it elsewhere?
it doesn't, it just blindly ignores BARs existence and maps BAR of
another device with driver over it.

> 
> > >   
> > > >    In that case NVDIMM could even map whole label area and
> > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > >    serializes that data through a 4K page.
> > > >    There is also performance issue with buffer allocated in RAM,
> > > >    because DMA adds unnecessary copying step when data could
> > > >    be read/written directly of NVDIMM.
> > > >    It might be no very important for _DSM interface but when it
> > > >    comes to supporting block mode it can become an issue.    
> > > 
> > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > it's guaranteed to work across BAR rebalancing.
> > > Would that address the performance issue?  
> > 
> > it would if rebalancing were to account for driverless PCI device BARs,
> > but it doesn't hence such BARs need to be statically pinned
> > at place where BIOS put them at start up.
> > I'm also not sure that PCIConfig operation region would work
> > on Windows without loaded driver (similar to _DSM case).
> > 
> >   
> > > > Above points make ACPI patching approach not robust and fragile
> > > > and hard to maintain.    
> > > 
> > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > get what appears your general dislike of the linker host/guest
> > > interface.  
> > Besides technical issues general dislike is just what I've written
> > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > 
> > to make it less fragile:
> >  1. it should be impossible to corrupt memory or patch wrong address.
> >     current impl. silently relies on value referenced by 'pointer' argument
> >     and to figure that out one has to read linker code on BIOS side.
> >     That could be easily set wrong and slip through review.  
> 
> That's an API issue, it seemed like a good idea but I guess
> it confuses people. Would you be happier using an offset
> instead of a pointer?
offset is better and it would be better if it were saying
which offset it is (i.e. relative to what)

> 
> >     API shouldn't rely on the caller setting value pointed by that 
> > argument.  
> 
> I couldn't parse that one. Care suggesting a cleaner API for linker?
here is current API signature:

bios_linker_loader_add_pointer(GArray *linker,
                                    const char *dest_file,
                                    const char *src_file,
                                    GArray *table, void *pointer,
                                    uint8_t pointer_size)

issue 1: 
where 'pointer' is a real pointer pointing inside 'table' and API
calculates offset underhood:
  offset = (gchar *)pointer - table->data;
and puts it in ADD_POINTER command.

it's easy to get wrong offset if 'pointer' is not from 'table'.

issue 2:
'pointer' points to another offset of size 'pointer_size' in 'table'
blob, that means that whoever composes blob, has to aware of
it and fill correct value there which is possible to do right
if one looks inside of SeaBIOS part of linker interface.
Which is easy to forget and then one has to deal with mess
caused by random memory corruption.

bios_linker_loader_add_pointer() and corresponding
ADD_POINTER command should take this second offset as argument
and do no require 'table' be pre-filled with it or
in worst case if of extending ADD_POINTER command is problematic
bios_linker_loader_add_pointer() should still take
the second offset and patch 'table' itself so that 'table' composer
don't have to worry about it.

issue 3:
all patching obviously needs bounds checking on QEMU side
so it would abort early if it could corrupt memory.

> 
> >  2. If it's going to be used for patching AML, it should assert
> >     when bios_linker_loader_add_pointer() is called if to be patched
> >     AML object is wrong and patching would corrupt AML blob.  
> 
> Hmm for example check that the patched data has
> the expected pattern?
yep, nothing could be done for raw tables but that should be possible
for AML tables and if pattern is unsupported/size doesn't match
it should abort QEMU early instead of corrupting table.

> 
> >   
> > > It's there and we are not moving away from it, so why not
> > > use it in more places?  Or if you think it's wrong, why don't you build
> > > something better then?  We could then maybe use it for these things as
> > > well.  
> > 
> > Yep, I think for vmgenid and even more so for nvdimm
> > it would be better to allocate GPAs in QEMU and map backing
> > MemoryRegions directly in QEMU.
> > For nvdimm (main data region)
> > we already do it using pc-dimm's GPA allocation algorithm, we also
> > could use similar approach for nvdimm's label area and vmgenid.
> > 
> > Here is a simple attempt to add a limited GPA allocator in high memory
> >  https://patchwork.ozlabs.org/patch/540852/
> > But it haven't got any comment from you and were ignored.
> > Lets consider it and perhaps we could come up with GPA allocator
> > that could be used for other things as well.  
> 
> For nvdimm label area, I agree passing things through
> a 4K buffer seems inefficient.
> 
> I'm not sure what's a better way though.
> 
> Use 64 bit memory? Setting aside old guests such as XP,
> does it break 32 bit guests?
it might not work with 32bit guests, the same way as mem hotplug
doesn't work for them unless they are PAE enabled.
but well that's a limitation of implementation and considering
that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
 
> I'm really afraid of adding yet another allocator, I think you
> underestimate the maintainance headache: it's not theoretical and is
> already felt.
Current maintenance headache is due to fixed handpicked
mem layout, we can't do much with it for legacy machine
types but with QEMU side GPA allocator we can try to switch
to a flexible memory layout that would allocate GPA
depending on QEMU config in a stable manner.

well there is maintenance headache with bios_linker as well
due to its complexity (multiple layers of indirection) and
it will grow when more places try to use it.
Yep we could use it as a hack, stealing RAM and trying implement
backwards DMA or we could be less afraid and consider
yet another allocator which will do the job without hacks
which should benefit QEMU in a long run (it might be not easy
to impl. it right but if we won't even try we would be buried
in complex hacks that 'work' for now)

> > >   
> > > >     
> > > > >     
> > > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > > address guest to host, instead of reserving
> > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > automatically.      
> > > > > > Could you elaborate more on this suggestion?      
> > > > > 
> > > > > I really just mean using PCI_Config operation region.
> > > > > If you wish, I'll try to post a prototype next week.    
> > > > I don't know much about PCI but it would be interesting,
> > > > perhaps we could use it somewhere else.
> > > > 
> > > > However it should be checked if it works with Windows,
> > > > for example PCI specific _DSM method is ignored by it
> > > > if PCI device doesn't have working PCI driver bound to it.
> > > >     
> > > > >     
> > > > > > > 
> > > > > > >       
> > > > > > > > ---
> > > > > > > > changes since 17:
> > > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" 
> > > > > > > > <address@hidden>
> > > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > changes since 14:
> > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" 
> > > > > > > > <address@hidden>
> > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" 
> > > > > > > > <address@hidden>
> > > > > > > >   - permit only one vmgenid to be created
> > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low 
> > > > > > > > mem
> > > > > > > > ---
> > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > >  hw/misc/vmgenid.c                  | 154 
> > > > > > > > +++++++++++++++++++++++++++++++++++++
> > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > 
> > > > > > > > diff --git a/default-configs/i386-softmmu.mak 
> > > > > > > > b/default-configs/i386-softmmu.mak
> > > > > > > > index b177e52..6402439 100644
> > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > +CONFIG_VMGENID=y
> > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak 
> > > > > > > > b/default-configs/x86_64-softmmu.mak
> > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > +CONFIG_VMGENID=y
> > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > >  
> > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > index 78758e2..0187262 100644
> > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > >  
> > > > > > > >  /* Supported chipsets: */
> > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > @@ -237,6 +238,40 @@ static void 
> > > > > > > > acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > +{
> > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > +
> > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > +    aml_append(dev, aml_name_decl("_HID", 
> > > > > > > > aml_string("QEMU0003")));
> > > > > > > > +    aml_append(dev, aml_name_decl("_CID", 
> > > > > > > > aml_string("VM_Gen_Counter")));
> > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", 
> > > > > > > > aml_string("VM_Gen_Counter")));
> > > > > > > > +
> > > > > > > > +    pkg = aml_package(2);
> > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > +
> > > > > > > > +    /*
> > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and 
> > > > > > > > Windows
> > > > > > > > +     * displays it as "PCI RAM controller" which is marked as 
> > > > > > > > NO_DRV
> > > > > > > > +     * so Windows ignores VMGEN device completely and doesn't 
> > > > > > > > check
> > > > > > > > +     * for resource conflicts which during PCI rebalancing can 
> > > > > > > > lead
> > > > > > > > +     * to another PCI device claiming ignored BARs. To prevent 
> > > > > > > > this
> > > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > > +     * For more verbose comment see this commit message.       
> > > > > > > >  
> > > > > > > 
> > > > > > > What does "this commit message" mean?      
> > > > > > above commit message. Should I reword it to just 'see commit 
> > > > > > message'
> > > > > >       
> > > > > > >       
> > > > > > > > +     */
> > > > > > > > +     crs = aml_resource_template();
> > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, 
> > > > > > > > AML_MIN_FIXED,
> > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 
> > > > > > > > 0,
> > > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE 
> > > > > > > > - 1, 0,
> > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > +     return dev;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /*
> > > > > > > >   * Because of the PXB hosts we cannot simply query 
> > > > > > > > TYPE_PCI_HOST_BRIDGE.
> > > > > > > >   * On i386 arch we only have two pci hosts, so we can look 
> > > > > > > > only for them.
> > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray 
> > > > > > > > *linker,
> > > > > > > >              }
> > > > > > > >  
> > > > > > > >              if (bus) {
> > > > > > > > +                Object *vmgen;
> > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > >                  /* Scan all PCI buses. Generate tables to 
> > > > > > > > support hotplug. */
> > > > > > > >                  build_append_pci_bus_devices(scope, bus, 
> > > > > > > > pm->pcihp_bridge_en);
> > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray 
> > > > > > > > *linker,
> > > > > > > >                      aml_append(scope, dev);
> > > > > > > >                  }
> > > > > > > >  
> > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > +                if (vmgen) {
> > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > +                        pci_get_bar_addr(pdev, 
> > > > > > > > VMGENID_VMGID_BUF_BAR);
> > > > > > > > +
> > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > +                        aml_append(scope, 
> > > > > > > > build_vmgenid_device(buf_paddr));
> > > > > > > > +
> > > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > > +                        aml_append(method,
> > > > > > > > +                            
> > > > > > > > aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > +                    }
> > > > > > > > +                }
> > > > > > > > +
> > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > >              }
> > > > > > > >          }
> > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray 
> > > > > > > > *linker,
> > > > > > > >      {
> > > > > > > >          aml_append(scope, aml_name_decl("_HID", 
> > > > > > > > aml_string("ACPI0006")));
> > > > > > > >  
> > > > > > > > -        aml_append(scope, aml_method("_L00", 0, 
> > > > > > > > AML_NOTSERIALIZED));
> > > > > > > > -
> > > > > > > >          if (misc->is_piix4) {
> > > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > > >              aml_append(method,
> > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += 
> > > > > > > > stm32f2xx_syscfg.o
> > > > > > > >  
> > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..a2fbdfc
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > +/*
> > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > + *
> > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > + *
> > > > > > > > + *  Authors: Gal Hammer <address@hidden>
> > > > > > > > + *           Igor Mammedov <address@hidden>
> > > > > > > > + *
> > > > > > > > + * This work is licensed under the terms of the GNU GPL, 
> > > > > > > > version 2 or later.
> > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > + *
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > +
> > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), 
> > > > > > > > VMGENID_DEVICE)
> > > > > > > > +
> > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > +    MemoryRegion iomem;
> > > > > > > > +    union {
> > > > > > > > +        uint8_t guid[16];
> > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > +    };
> > > > > > > > +    bool guid_set;
> > > > > > > > +} VmGenIdState;
> > > > > > > > +
> > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > +{
> > > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, 
> > > > > > > > NULL);
> > > > > > > > +    if (!obj) {
> > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > +    }
> > > > > > > > +    return obj;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > +{
> > > > > > > > +    Object *acpi_obj;
> > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > +
> > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > > +
> > > > > > > > +    acpi_obj = object_resolve_path_type("", 
> > > > > > > > TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > +    if (acpi_obj) {
> > > > > > > > +        AcpiDeviceIfClass *adevc = 
> > > > > > > > ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > +
> > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > +    }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, 
> > > > > > > > Error **errp)
> > > > > > > > +{
> > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > +
> > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > +                   value);
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    s->guid_set = true;
> > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, 
> > > > > > > > void *opaque,
> > > > > > > > +                                   const char *name, Error 
> > > > > > > > **errp)
> > > > > > > > +{
> > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > +
> > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not 
> > > > > > > > initialized",
> > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > +{
> > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > +
> > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", 
> > > > > > > > sizeof(s->guid_page),
> > > > > > > > +                           &error_abort);
> > > > > > > > +
> > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, 
> > > > > > > > vmgenid_set_guid, NULL);
> > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, 
> > > > > > > > NULL, NULL);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > +{
> > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > +    bool ambiguous = false;
> > > > > > > > +
> > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > > +    if (ambiguous) {
> > > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > > +                         " device is permitted");
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    if (!s->guid_set) {
> > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is 
> > > > > > > > not set",
> > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | 
> > > > > > > > PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > +        &s->iomem);
> > > > > > > > +    return;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > > +{
> > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > +
> > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > +{
> > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > diff --git a/include/hw/misc/vmgenid.h 
> > > > > > > > b/include/hw/misc/vmgenid.h
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..b90882c
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > +/*
> > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > + *
> > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > + *
> > > > > > > > + *  Authors: Gal Hammer <address@hidden>
> > > > > > > > + *           Igor Mammedov <address@hidden>
> > > > > > > > + *
> > > > > > > > + * This work is licensed under the terms of the GNU GPL, 
> > > > > > > > version 2 or later.
> > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > + *
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > +
> > > > > > > > +#include "qom/object.h"
> > > > > > > > +
> > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > +
> > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > +
> > > > > > > > +#endif
> > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > >  
> > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > -- 
> > > > > > > > 1.8.3.1        
> > >   




reply via email to

[Prev in Thread] Current Thread [Next in Thread]