[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RFC PATCH 0/1] Removing RAMBlocks during migration
From: |
Yury Kotov |
Subject: |
[RFC PATCH 0/1] Removing RAMBlocks during migration |
Date: |
Mon, 9 Dec 2019 10:41:01 +0300 |
Hi,
I found that it's possible to remove a RAMBlock during migration.
E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
And I want to clarify whether RAMBlock removing (or even adding) during
migration is valid operation or it's a bug.
Currently, it may cause some race conditions with migration thread and
migration may fail because of them. For instance, vmstate_unregister_ram
function which is called during PCIe device removing does these:
- Memset idstr -> target may receive unknown/zeroed idstr -> migration fail
- Set RAMBlock flags as non-migratable -> migration fail
RAMBlock removing itself seems safe for migration thread because of RCU.
But it seems to me there are other possible race conditions (didn't test it):
- qemu_put_buffer_async -> saves pointer to RAMBlock's memory
-> block will be freed out of RCU (between ram save iterations)
-> qemu_fflush -> access to freed memory.
So, I have the following questions:
1. Is RAMBlock removing/adding OK during migration?
2. If yes then what should we do with vmstate_unregister_ram?
- Just remove vmstate_unregister_ram (my RFC patch)
- Refcount RAMBlock's migratable/non-migratable state
- Something else?
3. If it mustn't be possible, so may be
assert(migration_is_idle()) in qemu_ram_free?
P.S.
I'm working on a fix of below problem and trying to choose better way:
allow device removing and fix all problem like this or fix a particular device.
--------
How to reproduce device removing during migration:
1. Source QEMU command line (target is similar)
$ x86_64-softmmu/qemu-system-x86_64 \
-nodefaults -no-user-config -m 1024 -M q35 \
-qmp unix:./src.sock,server,nowait \
-drive file=./image,format=raw,if=virtio \
-device ioh3420,id=pcie.1 \
-device virtio-net,bus=pcie.1
2. Start migration with slow speed (to simplify reproducing)
3. Power off a device on the hotplug pcie.1 bus:
$ echo 0 > /sys/bus/pci/slots/0/power
4. Increase migration speed and wait until fail
Most likely you will get something like this:
qemu-system-x86_64: get_pci_config_device: Bad config data:
i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
qemu-system-x86_64: Failed to load PCIDevice:config
qemu-system-x86_64: Failed to load
ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj
qemu-system-x86_64: error while loading state for instance 0x0 of device
'0000:00:03.0/ioh-3240-express-root-port'
qemu-system-x86_64: load of migration failed: Invalid argument
This error is just an illustration of the removing device possibility,
but not actually an illustration of the race conditions for removing RAMBlock.
Regards,
Yury
Yury Kotov (1):
migration: Remove vmstate_unregister_ram
hw/block/pflash_cfi01.c | 1 -
hw/block/pflash_cfi02.c | 1 -
hw/mem/pc-dimm.c | 5 -----
hw/misc/ivshmem.c | 2 --
hw/pci/pci.c | 1 -
include/migration/vmstate.h | 1 -
migration/savevm.c | 6 ------
7 files changed, 17 deletions(-)
--
2.24.0
- [RFC PATCH 0/1] Removing RAMBlocks during migration,
Yury Kotov <=