qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [raw] Guest stuck during live live-migration


From: Wei Wang
Subject: Re: [raw] Guest stuck during live live-migration
Date: Tue, 15 Dec 2020 09:46:35 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 11/23/2020 05:36 PM, Quentin Grolleau wrote:
Hello,

In our company, we are hosting a large number of Vm, hosted behind Openstack (so libvirt/qemu). A large majority of our Vms are runnign with local data only, stored on NVME, and most of them are RAW disks.

With Qemu 4.0(can be even with older version)we see strange live-migrationcomportement:
    - some Vms live migrate at very high speed without issue (> 6 Gbps)
- some Vms are running correctly, but migrating at a strange low speed (3Gbps) - some Vms are migrating at a very low speed (1Gbps, sometime less) and during the migration the guest is completely I/O stuck When this issue happen the VM is completly block, iostat in the Vm show us a latency of 30 secs

First we thought it was related to an hardware issuewe check it, we comparing different hardware, but no issue where found there

So one of my colleague had the idea to limit with "tc" the bandwidth on the interface the migration was done, and it worked the VM didn't lose any ping nor being I/O stuck Important point : Once the Vm have been migrate (with the limitation ) one time, if we migrate it again right after, the migration will be done at full speed (8-9Gb/s) without freezing the Vm

It only happen on existing VM, we tried to replicate with a fresh instance with exactly the same spec and nothing was happening

We tried to replicate the workload inside the VM but there was no way to replicate the case. So it was not related to the workload nor to the server that hosts the Vm

So we thought about the disk of the instance : the raw file.

We also tried to strace -c the process during the live-migration and it was doing a lot of "lseek"

and we found this :
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00462.html


So i rebuilt Qemu with this patch and the live-migration went well, at high speedand with no VM freeze
( https://github.com/qemu/qemu/blob/master/block/file-posix.c#L2601)

Do you have a way to avoid the "lseek" mechanism as it consumes more resources to find the holes in the disk and don't let any for the VM ?


Server hosting the VM :
- Bi-Xeon hosts With NVME storage and 10 Go Network card
- Qemu 4.0 And Libvirt 5.4
- Kernel 4.18.0.25

Guest having the issue :
- raw image with Debian 8

Here the qemu img on the disk :
> qemu-img info disk
image: disk
file format: raw
virtual size: 400G (429496729600 bytes)
disk size: 400G


Could you share the migration options that you use and "info migrate" for both stuck and non-stuck cases?

Best,
Wei





reply via email to

[Prev in Thread] Current Thread [Next in Thread]