qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Runtime-modified DIMMs and live migration issue


From: Andrey Korolyov
Subject: Re: [Qemu-devel] Runtime-modified DIMMs and live migration issue
Date: Wed, 17 Jun 2015 18:32:04 +0300

> I've checked logs, so far I don't see anything suspicious there
> except of "acpi PNP0C80:00: Already enumerated" lines,
> probably rising log level might show more info
>  + upload full logs
>  + enable ACPI debug info to so that dimm device's _CRS would show up
>  + QEMU's CLI that was used to produce such log
>
> wrt migration:
> could you provide exact CLI args on source and destination along with
> used intermediate mem hotplug commands or even better if it's just
> reproduced with migration of cold-plugged dimm-s for simplification
> + steps to reproduce (and guest kernel versions).

Thanks Igor,

I am using 3.10 and 3.16 guest kernels lately, but it seems that the
issue is hitting every OS. Issue is not reproducible with cold-plugged
DIMMs at all which is kinda confusing, bearing in mind race-like
behavior described previously, either the guest kernel is partially
responsible for the issue or its nature will be ultimately weird. You
can borrow full cli arg set from the message containing 'Please find
the full cli args and two guest logs for DIMM' three days ago in this
chain. The destination emulator launch string is identical to source
plus device/object pairs in the args for hotplugged memory; mem
devices are getting onlined automatically via udev script. My
colleague suggested me to disable CONFIG_SPARSEMEM_VMEMMAP to remove
the side mess of printks from sparse hotplug mapping and, as it was
shown with that, there is nothing wrong with per-dimm memory
population map, the runtime and coldplugged maps are identical in this
case.

Another trace with null IP is attached, it is produced by running fio.
The easiest way to set up the test bed and to reproduce the issue is
to launch an attached VM with xml (add disk and optionally framebuffer
for convenience), ripping out two or three dimms, then stop libvirt,
add those dimms back in a runtime config, launch libvirt back, add
those dimms, put the workload on VM and migrate a VM with live flag.
Or, if it would be more acceptable for you, launch bare qemu with some
empty slots, plug appropriate objects and devices in (object_add
memory-backend-ram,id=memX,size=512M,
pc-dimm,id=dimmX,node=0,memdev=memX) and migrate to a receiver with
same dimms added to the args. Please not forget to online dimms in
guest as well.

I don`t think that it could be ACPI-related in any way, instead, it
looks like race in vhost or simular mm-touching mechanism. The
repeated hits you mentioned should be fixed as well indeed, but they
can be barely the reason for this problem.

Attachment: fio-trace-no-IP.txt
Description: Text document

Attachment: sample-vm-for-hotplug.xml
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]