[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[raw] Guest stuck during live live-migration

From: Quentin Grolleau
Subject: [raw] Guest stuck during live live-migration
Date: Mon, 23 Nov 2020 09:36:55 +0000


In our company, we are hosting a large number of Vm, hosted behind Openstack (so libvirt/qemu). 
A large majority of our Vms are runnign with local data only, stored on NVME, and most of them are RAW disks.

With Qemu 4.0 (can be even with older version) we see strange  live-migration comportement:
    - some Vms live migrate at very high speed without issue (> 6 Gbps)
    - some Vms are running correctly, but migrating at a strange low speed (3Gbps)
    - some Vms are migrating at a very low speed (1Gbps, sometime less) and during the migration the guest is completely I/O stuck
When this issue happen the VM is completly block, iostat in the Vm show us a latency of 30 secs

First we thought it was related to an hardware issue we check it, we comparing different hardware, but no issue where found there

So one of my colleague had the idea to limit with "tc" the bandwidth on the interface the migration was done, and it worked the VM didn't lose any ping nor being  I/O  stuck
Important point : Once the Vm have been migrate (with the limitation ) one time, if we migrate it again right after, the migration will be done at full speed (8-9Gb/s) without freezing the Vm

It only happen on existing VM, we tried to replicate with a fresh instance with exactly the same spec and nothing was happening

We tried to replicate the workload inside the VM but there was no way to replicate the case. So it was not related to the workload nor to the server that hosts the Vm

So we thought about the disk of the instance : the raw file.

We also tried to strace -c the process during the live-migration and it was doing a lot of "lseek"

and we found this : 

So i rebuilt Qemu with this patch and the live-migration went well, at high speed and with no VM freeze

Do you have a way to avoid the "lseek" mechanism as it consumes more resources to find the holes in the disk and don't let any for the VM ?

Server hosting the VM : 
    - Bi-Xeon hosts With NVME storage and 10 Go Network card
    - Qemu 4.0 And Libvirt 5.4
    - Kernel

Guest having the issue : 
    - raw image with Debian 8

Here the qemu img on the disk : 
> qemu-img info disk
image: disk
file format: raw
virtual size: 400G (429496729600 bytes)
disk size: 400G


reply via email to

[Prev in Thread] Current Thread [Next in Thread]