[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] hitting intermittent issue with live migration from qem
Re: [Qemu-devel] hitting intermittent issue with live migration from qemu-kvm-ev 2.3.0 to qemu-kvm-ev 2.6.0
Tue, 4 Apr 2017 08:28:51 -0600
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
On 04/04/2017 07:56 AM, Ladi Prosek wrote:
On Mon, Apr 3, 2017 at 9:11 PM, Stefan Hajnoczi <address@hidden> wrote:
On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote:
Initially we have a bunch of guests running on compute-2 (which is running
qemu-kvm-ev 2.3.0). We then started live-migrating them one at a time to
compute-0 (which is running qemu-kvm-ev 2.6.0). Three of them migrated
successfully. The fourth (which was essentially identical in configuration
to the first three) failed, as per the following logs in
2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
- used_idx 0x47c
2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
0x0 of device '0000:00:07.0/virtio-balloon'
2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
2017-03-29 06:38:37.896+0000: shutting down
Does anyone know of an existing bug report covering this issue? (I took a
look and didn't see anything obviously related.)
This is the virtio-balloon device. If you remove the device the live
migration should work reliably.
Alternatively, you can temporarily rmmod virtio_balloon inside the guest
for live migration. After migration you can modprobe virtio_balloon
last_avail_idx 0x47b with used_idx 0x47c is an invalid device state.
I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against
qemu.git/master and do not see an obvious bug. I also compared
qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1.
The device likely got into the invalid state as part of a previous
migration to an unfixed QEMU. I second Stefan's suggestion to
temporarily remove the device or unload the driver.
I'll give that a try (been busy with a separate issue).
If I have a guest already running, can I unilaterally hot-remove the device from
the host side or does the guest need to be involved as well? (I'm just trying
to figure out how to deal with existing guests.)