qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Unresponsive linux guest once migrated


From: Chris Dunlop
Subject: [Qemu-devel] Unresponsive linux guest once migrated
Date: Fri, 28 Mar 2014 09:52:42 +1100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

I have a problem where I migrate a linux guest VM, and on the
receiving side the guest goes to 100% cpu as seen by the host, and
the guest itself is unresponsive, e.g. not responding to ping etc.
The only way out I've found is to destroy the guest.

This seems to only happen if the guest has been idle for an extended
period (e.g. overnight). I've migrated the guest 100 times in a row
without any problems when the guest has been used "a little" (e.g.
logging in and looking around, it's not doing anything normally).

I've not had similar problems migrating Windows guests.

guest - debian wheezy, kernel 3.2.0-4-amd64
host - debian wheezy, kernel 3.10.33 x86_64 (self-compiled)
qemu - qemu_1.7.0+dfsg-2~bpo70+2 + rbd (self-compiled)

All guests use ceph rbd for backing store.

qemu-system-x86_64 -enable-kvm -name test -S -machine pc-1.0,accel=kvm,usb=off 
-m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 
620dd8e0-f24c-485d-a134-ba5961ce6531 -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=rbd:pool/test:id=test:key=xxxxxxxxxxx=:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device 
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:29:10:16,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0 -vnc 127.0.0.1:0 -device 
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming tcp:[::]:49152 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

Ps tells me the qemu-system-x86_64 process has 17 threads, and it's
the 2nd last of these that's consuming the cpu. Strace on that
thread doen't tell me much:

rt_sigtimedwait([BUS USR1], 0x7f5761957b30, {0, 0}, 8) = -1 EAGAIN (Resource 
temporarily unavailable)
rt_sigpending([])                       = 0
ioctl(16, KVM_RUN <unfinished ...>

Using 'echo l > /proc/sysrq-trigger' a few times shows me the CPU
running that thread is always at vmx_vcpu_run+0x5eb, e.g.:

[571745.343753] NMI backtrace for cpu 2
[571745.343779] CPU: 2 PID: 31618 Comm: qemu-system-x86 Tainted: G           O 
3.10.33-otn-00017-g510ea14 #2
[571745.343827] Hardware name: Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.0c      
 07/19/11   
[571745.343871] task: ffff880002f99380 ti: ffff8801acaf0000 task.ti: 
ffff8801acaf0000
[571745.343915] RIP: 0010:[<ffffffffa104130b>]  [<ffffffffa104130b>] 
vmx_vcpu_run+0x5eb/0x670 [kvm_intel]
[571745.343978] RSP: 0018:ffff8801acaf3cc8  EFLAGS: 00000082
[571745.344004] RAX: 0000000080000202 RBX: 0000000001443980 RCX: 
ffff8801fd698000
[571745.344046] RDX: 0000000000000200 RSI: 00000000693e2680 RDI: 
ffff8801fd698000
[571745.344089] RBP: ffff8801acaf3d38 R08: 00000000693e9b40 R09: 
0000000000000000
[571745.344131] R10: 0000000000000f08 R11: 0000000000000000 R12: 
0000000000000000
[571745.344174] R13: 0000000000000001 R14: 0000000000000014 R15: 
ffffffffffffffff
[571745.344217] FS:  00007f5609fec700(0000) GS:ffff88081fc80000(0000) 
knlGS:fffff801388f8000
[571745.344261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[571745.344288] CR2: 0000000001449b8a CR3: 00000006eaa6c000 CR4: 
00000000000027e0
[571745.344330] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[571745.344373] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[571745.344415] Stack:
[571745.344435]  ffff8801acaf3d38 ffffffffa1042576 0000000000000000 
ffff8801fd698000
[571745.344487]  0000000200000000 ffff8801fd698000 ffff8801acaf3d18 
ffff8805cbfbc040
[571745.344539]  0000000000000002 ffff8805cbfbc040 0000000000000001 
0000000000000000
[571745.344590] Call Trace:
[571745.344615]  [<ffffffffa1042576>] ? vmx_handle_exit+0xf6/0x8d0 [kvm_intel]
[571745.344661]  [<ffffffffa0459341>] kvm_arch_vcpu_ioctl_run+0x9a1/0x1100 [kvm]
[571745.344699]  [<ffffffffa04543d7>] ? kvm_arch_vcpu_load+0x57/0x1e0 [kvm]
[571745.344734]  [<ffffffffa0444d24>] kvm_vcpu_ioctl+0x2b4/0x580 [kvm]
[571745.344767]  [<ffffffffa04468ef>] ? kvm_vm_ioctl+0x57f/0x5f0 [kvm]
[571745.344797]  [<ffffffff81147090>] do_vfs_ioctl+0x90/0x520
[571745.344825]  [<ffffffff8106fd98>] ? __enqueue_entity+0x78/0x80 
[571745.344853]  [<ffffffff81083b38>] ? SyS_futex+0x98/0x1a0
[571745.344887]  [<ffffffffa044e1b4>] ? kvm_on_user_return+0x64/0x70 [kvm]
[571745.344916]  [<ffffffff81147570>] SyS_ioctl+0x50/0x90
[571745.344944]  [<ffffffff813bf782>] system_call_fastpath+0x16/0x1b
[571745.344971] Code: 82 1c 02 00 00 a8 10 0f 84 8b fa ff ff e9 66 ff ff ff 66 
0f 1f 44 00 00 85 c0 0f 89 51 fd ff ff 48 8b 7d a8 e8 87 9f 40 ff cd 02 <48> 8b 
7d a8 e8 9c 9f 40 ff e9 38 fd ff ff 48 89 f9 48 c1 e9 0d 


What can I do to help track this down?

Cheers,

Chris



reply via email to

[Prev in Thread] Current Thread [Next in Thread]