Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID

From:	Marcel Apfelbaum
Subject:	Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
Date:	Tue, 16 Feb 2016 14:36:49 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

On 02/16/2016 02:17 PM, Igor Mammedov wrote:

On Tue, 16 Feb 2016 12:05:33 +0200
Marcel Apfelbaum <address@hidden> wrote:

On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:

On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:

On Tue, 9 Feb 2016 14:17:44 +0200
"Michael S. Tsirkin" <address@hidden> wrote:

On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:

So the linker interface solves this rather neatly:
bios allocates memory, bios passes memory map to guest.
Served us well for several years without need for extensions,
and it does solve the VM GEN ID problem, even though
1. it was never designed for huge areas like nvdimm seems to want to use
2. we might want to add a new 64 bit flag to avoid touching low memory

linker interface is fine for some readonly data, like ACPI tables
especially fixed tables not so for AML ones is one wants to patch it.

However now when you want to use it for other purposes you start
adding extensions and other guest->QEMU channels to communicate
patching info back.
It steals guest's memory which is also not nice and doesn't scale well.


This is an argument I don't get. memory is memory. call it guest memory
or RAM backed PCI BAR - same thing. MMIO is cheaper of course
but much slower.

...

It however matters for user, he pays for guest with XXX RAM but gets less
than that. And that will be getting worse as a number of such devices
increases.

OK fine, but returning PCI BAR address to guest is wrong.
How about reading it from ACPI then? Is it really
broken unless there's *also* a driver?

I don't get question, MS Spec requires address (ADDR method),
and it's read by ACPI (AML).


You were unhappy about DMA into guest memory.
As a replacement for DMA, we could have AML read from
e.g. PCI and write into RAM.
This way we don't need to pass address to QEMU.

That sounds better as it saves us from allocation of IO port
and QEMU don't need to write into guest memory, the only question is
if PCI_Config opregion would work with driver-less PCI device.


Or PCI BAR for that reason. I don't know for sure.


And it's still pretty much not test-able since it would require
fully running OSPM to execute AML side.


AML is not testable, but that's nothing new.
You can test reading from PCI.

As for working PCI_Config OpRegion without driver, I haven't tried,
but I wouldn't be surprised if it doesn't, taking in account that
MS introduced _DSM doesn't.

     Just compare with a graphics card design, where on device memory
     is mapped directly at some GPA not wasting RAM that guest could
     use for other tasks.


This might have been true 20 years ago.  Most modern cards do DMA.


Modern cards, with it's own RAM, map its VRAM in address space directly
and allow users use it (GEM API). So they do not waste conventional RAM.
For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
series (even PCI class id is the same)


Don't know enough about graphics really, I'm not sure how these are
relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
mostly use guest RAM, not on card RAM.

     VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
     instead of consuming guest's RAM they should be mapped at
     some GPA and their memory accessed directly.


VMGENID is tied to a spec that rather arbitrarily asks for a fixed
address. This breaks the straight-forward approach of using a
rebalanceable PCI BAR.


For PCI rebalance to work on Windows, one has to provide working PCI driver
otherwise OS will ignore it when rebalancing happens and
might map something else over ignored BAR.


Does it disable the BAR then? Or just move it elsewhere?

it doesn't, it just blindly ignores BARs existence and maps BAR of
another device with driver over it.


Interesting. On classical PCI this is a forbidden configuration.
Maybe we do something that confuses windows?
Could you tell me how to reproduce this behaviour?

#cat > t << EOF
pci_update_mappings_del
pci_update_mappings_add
EOF

#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
   -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
   -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
   -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01

wait till OS boots, note BARs programmed for ivshmem
   in my case it was
     01:01.0 0,0xfe800000+0x100
then execute script and watch pci_update_mappings* trace events

# for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i 
| nc -U /tmp/m; sleep 5; done;

hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
and then programs new BARs, where:
    pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
creates overlapping BAR with ivshmem



Thanks!
We need to figure this out because currently this does not
work properly (or maybe it works, but merely by chance).
Me and Marcel will play with this.


I checked and indeed we have 2 separate problems:

1. ivshmem is declared as PCI RAM controller and Windows *does* have the drivers
     for it, however it is not remapped on re-balancing.

Does it really have a driver, i.e ivshmem specific one?
It should have its own driver or otherwise userspace
won't be able to access/work with it and it would be pointless
to add such device to machine.


No, it does not.

     You can see on Device Manage 2 working devices with the same MMIO region - 
strange!
     This may be because PCI RAM controllers can't be re-mapped? Even then, it 
should not be overridden.
     Maybe we need to add a clue to the OS in ACPI regarding this range?

2. PCI devices with no driver installed are not re-mapped. This can be OK
     from the Windows point of view because Resources Window does not show the 
MMIO range
     for this device.

     If the other (re-mapped) device is working, is pure luck. Both Memory 
Regions occupy the same range
     and have the same priority.

We need to think about how to solve this.
One way would be to defer the BAR activation to the guest OS, but I am not sure 
of the consequences.

deferring won't solve problem as rebalancing could happen later
and make BARs overlap.


Why not? If we do not activate the BAR in firmware and Windows does not have a 
driver
for it, will not activate it at all, right?
Why would Windows activate the device BAR if it can't use it? At least this is 
what I hope.
Any other idea would be appreciated.

I've noticed that at startup Windows unmaps and then maps BARs
at the same addresses where BIOS've put them before.


Including devices without a working driver?


Thanks,
Marcel

And this does not solve the ivshmem problem.

So far the only way to avoid overlapping BARs due to Windows
doing rebalancing for driver-less devices is to pin such
BARs statically with _CRS in ACPI table but as Michael said
it fragments PCI address-space.


Thanks,
Marcel

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device, (continued)

Prev by Date: Re: [Qemu-devel] [RFC] QMP: add query-hotpluggable-cpus
Next by Date: Re: [Qemu-devel] [libvirt] [RFC PATCH 0/2] ARM: add QMP command to query GIC version
Previous by thread: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
Next by thread: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
Index(es):
- Date
- Thread