qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] KVM/Qemu migration under fast network stress


From: Or Gerlitz
Subject: [Qemu-devel] KVM/Qemu migration under fast network stress
Date: Wed, 25 Nov 2015 00:44:35 +0200

Hi,

When doing live migration with iperf running on the migrated VM
over PV/virtio interface -- it doesn't end when the number of threads > 1 :(
nor when we run an IO (fio) benchmark which gets high-throughput. None
of these tests run over the network the migration is carried out.

We get this "error: operation failed: migration job: unexpectedly failed"

The networking/IO benchmarks which executes at the time of migration runs
over PV/virtio-->vhost-->Mellanox fast Eth NIC.

In our env, with two threads the  BW is ~20Gbs and with one
thread ~10Gbs, IO BW can be around 600MB/s.

Using the verbose option of virsh we see the migration status goes up to
99% and stays there for couple of seconds and then migration fails.

If we used the --unsafe option it does end and goes from 99% to 100%
at some point.

I wonder if there is something we do wrong or can optimize... we're
using upstream kernel 4.3 on both hosts and guest, the host OS is RHEL 7.0
based e.g with qemu qemu-kvm-1.5.3-60.el7.x86_64 - rings some bell?
would it help to upgrade qemu?

Is there an option to get the live migration process to give up on
looping over dirty pages
and stop the v-cpu on the migrate-out host so this migration actually ends?

We're attaching debug messages from virsh for the time  of migration,
you can see there it's on 99% for couple of seconds and then gives up.

Or.



2015-11-24 14:13:47.190+0000: 34957: debug :
virKeepAliveCheckMessage:374 : ka=0x55d161e4e230,
client=0x55d161e4df40, msg=0x55d161e4dfa8
2015-11-24 14:13:47.190+0000: 34957: debug : virNetMessageClear:55 :
msg=0x55d161e4dfa8 nfds=0
2015-11-24 14:13:47.190+0000: 34957: debug :
virNetClientIOEventLoopRemoveDone:1376 : Waking up sleep
0x55d161e4e520
2015-11-24 14:13:47.190+0000: 34955: debug : virNetClientIO:1750 :
Woken up from sleep head=0x7f3644001d40 call=0x55d161e4e520
2015-11-24 14:13:47.190+0000: 34955: debug : virNetClientIO:1803 : All
done with our call head=0x7f3644001d40 call=0x55d161e4e520 rv=0
2015-11-24 14:13:47.190+0000: 34955: debug : virNetMessageFree:72 :
msg=0x55d161e4e7d0 nfds=0 cb=(nil)
Migration: [ 99 %]2015-11-24 14:13:47.574+0000: 34956: debug :
virEventPollRunOnce:638 : Poll got 1 event(s)

The above is the last time we see the "Migration: [ 99 %]" running progress bar

2015-11-24 14:13:47.574+0000: 34956: debug :
virEventPollDispatchTimeouts:423 : Dispatch 0
2015-11-24 14:13:47.574+0000: 34956: debug :
virEventPollDispatchHandles:468 : Dispatch 2
2015-11-24 14:13:47.574+0000: 34956: debug :
virEventPollDispatchHandles:482 : i=0 w=1
2015-11-24 14:13:47.574+0000: 34956: debug :
virEventPollDispatchHandles:482 : i=2 w=3
2015-11-24 14:13:47.574+0000: 34956: debug :
virEventPollDispatchHandles:496 : EVENT_POLL_DISPATCH_HANDLE: watch=3
events=1
2015-11-24 14:13:47.574+0000: 34956: debug :
virNetClientIncomingEvent:1816 : client=0x7f36440019a0 wantclose=0
2015-11-24 14:13:47.574+0000: 34956: debug :
virNetClientIncomingEvent:1824 : Event fired 0x7f3644001760 1
2015-11-24 14:13:47.574+0000: 34956: debug :
virNetMessageDecodeLength:149 : Got length, now need 28 total (24
more)
2015-11-24 14:13:47.574+0000: 34956: debug :
virNetClientCallDispatch:1123 : RPC_CLIENT_MSG_RX:
client=0x7f36440019a0 len=28 prog=1801807216 vers=1 proc=1 type=2
status=0 serial=0
2015-11-24 14:13:47.574+0000: 34956: debug :
virKeepAliveCheckMessage:374 : ka=0x7f36440015f0,
client=0x7f36440019a0, msg=0x7f3644001a08
2015-11-24 14:13:47.574+0000: 34956: debug :
virKeepAliveCheckMessage:391 : RPC_KEEPALIVE_RECEIVED:
ka=0x7f36440015f0 client=0x7f36440019a0 prog=1801807216 vers=1 proc=1


and later we see "error: operation failed: migration job: unexpectedly failed"


015-11-24 14:13:47.766+0000: 34957: debug :
virNetClientCloseInternal:687 : client=0x7f36440019a0 wantclose=0
2015-11-24 14:13:47.766+0000: 34956: debug : virFileClose:90 : Closed fd 16
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f36440019a0
2015-11-24 14:13:47.767+0000: 34956: debug : virProcessAbort:104 :
aborting child process 34958
2015-11-24 14:13:47.767+0000: 34956: debug : virProcessAbort:109 :
process has ended: exit status 1
2015-11-24 14:13:47.767+0000: 34956: debug :
virEventRunDefaultImpl:270 : running default event implementation
2015-11-24 14:13:47.767+0000: 34956: debug :
virEventPollCleanupTimeouts:514 : Cleanup 0
2015-11-24 14:13:47.767+0000: 34956: debug :
virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used,
releasing 0
2015-11-24 14:13:47.767+0000: 34956: debug :
virEventPollCleanupHandles:562 : Cleanup 2
2015-11-24 14:13:47.767+0000: 34956: debug :
virEventPollMakePollFDs:391 : Prepare n=0 w=1, f=7 e=1 d=0
2015-11-24 14:13:47.767+0000: 34956: debug :
virEventPollMakePollFDs:391 : Prepare n=1 w=2, f=9 e=1 d=0
2015-11-24 14:13:47.767+0000: 34956: debug :
virEventPollCalculateTimeout:332 : Calculate expiry of 0 timers
2015-11-24 14:13:47.767+0000: 34956: debug :
virEventPollCalculateTimeout:361 : Timeout at 0 due in -1 ms
2015-11-24 14:13:47.767+0000: 34956: debug : virEventPollRunOnce:627 :
EVENT_POLL_RUN: nhandles=2 timeout=-1
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:258 :
OBJECT_DISPOSE: obj=0x7f36440019a0
2015-11-24 14:13:47.767+0000: 34957: debug : virNetClientDispose:599 :
RPC_CLIENT_DISPOSE: client=0x7f36440019a0
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644010c80
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644001da0
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644001510
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644001bc0
2015-11-24 14:13:47.767+0000: 34957: debug : virFileClose:90 : Closed fd 17
2015-11-24 14:13:47.767+0000: 34957: debug : virFileClose:90 : Closed fd 15
2015-11-24 14:13:47.767+0000: 34957: debug : virNetMessageClear:55 :
msg=0x7f3644001a08 nfds=0
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644001da0
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:258 :
OBJECT_DISPOSE: obj=0x7f3644001da0
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644001510
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:258 :
OBJECT_DISPOSE: obj=0x7f3644001510
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644001bc0
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:258 :
OBJECT_DISPOSE: obj=0x7f3644001bc0
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644010c80
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:258 :
OBJECT_DISPOSE: obj=0x7f3644010c80
2015-11-24 14:13:47.767+0000: 34957: debug : virDomainFree:2428 :
dom=0x7f3644000c30, (VM: name=dev-h-vrt-008-006,
uuid=04e20bfc-89e6-4098-bace-6a78c3e50919)
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x7f3644000c30
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:258 :
OBJECT_DISPOSE: obj=0x7f3644000c30
2015-11-24 14:13:47.767+0000: 34957: debug : virDomainDispose:262 :
release domain 0x7f3644000c30 dev-h-vrt-008-006
04e20bfc-89e6-4098-bace-6a78c3e50919
2015-11-24 14:13:47.767+0000: 34957: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x55d161e4d340
2015-11-24 14:13:47.767+0000: 34955: debug : virDomainFree:2428 :
dom=0x55d161e4d2d0, (VM: name=dev-h-vrt-008-006,
uuid=04e20bfc-89e6-4098-bace-6a78c3e50919)
2015-11-24 14:13:47.767+0000: 34955: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x55d161e4d2d0
2015-11-24 14:13:47.767+0000: 34955: debug : virObjectUnref:258 :
OBJECT_DISPOSE: obj=0x55d161e4d2d0
2015-11-24 14:13:47.767+0000: 34955: debug : virDomainDispose:262 :
release domain 0x55d161e4d2d0 dev-h-vrt-008-006
04e20bfc-89e6-4098-bace-6a78c3e50919
2015-11-24 14:13:47.767+0000: 34955: debug : virObjectUnref:256 :
OBJECT_UNREF: obj=0x55d161e4d340
2015-11-24 14:13:47.767+0000: 34955: debug : virFileClose:90 : Closed fd 12
2015-11-24 14:13:47.767+0000: 34955: debug : virFileClose:90 : Closed fd 13
error: operation failed: migration job: unexpectedly failed

Attachment: migrate_a_b_fail.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]