[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] vm performance degradation after kvm live migration or save
From: |
Zhanghaoyu (A) |
Subject: |
[Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled |
Date: |
Thu, 11 Jul 2013 09:36:47 +0000 |
hi all,
I met similar problem to these, while performing live migration or save-restore
test on the kvm platform (qemu:1.4.0, host:suse11sp2, guest:suse11sp2), running
tele-communication software suite in guest,
https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
https://bugzilla.kernel.org/show_bug.cgi?id=58771
After live migration or virsh restore [savefile], one process's CPU utilization
went up by about 30%, resulted in throughput degradation of this process.
oprofile report on this process in guest,
pre live migration:
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % app name symbol name
248 12.3016 no-vmlinux (no symbols)
78 3.8690 libc.so.6 memset
68 3.3730 libc.so.6 memcpy
30 1.4881 cscf.scu SipMmBufMemAlloc
29 1.4385 libpthread.so.0 pthread_mutex_lock
26 1.2897 cscf.scu SipApiGetNextIe
25 1.2401 cscf.scu DBFI_DATA_Search
20 0.9921 libpthread.so.0 __pthread_mutex_unlock_usercnt
16 0.7937 cscf.scu DLM_FreeSlice
16 0.7937 cscf.scu receivemessage
15 0.7440 cscf.scu SipSmCopyString
14 0.6944 cscf.scu DLM_AllocSlice
post live migration:
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % app name symbol name
1586 42.2370 libc.so.6 memcpy
271 7.2170 no-vmlinux (no symbols)
83 2.2104 libc.so.6 memset
41 1.0919 libpthread.so.0 __pthread_mutex_unlock_usercnt
35 0.9321 cscf.scu SipMmBufMemAlloc
29 0.7723 cscf.scu DLM_AllocSlice
28 0.7457 libpthread.so.0 pthread_mutex_lock
23 0.6125 cscf.scu SipApiGetNextIe
17 0.4527 cscf.scu SipSmCopyString
16 0.4261 cscf.scu receivemessage
15 0.3995 cscf.scu SipcMsgStatHandle
14 0.3728 cscf.scu Urilex
12 0.3196 cscf.scu DBFI_DATA_Search
12 0.3196 cscf.scu SipDsmGetHdrBitValInner
12 0.3196 cscf.scu SipSmGetDataFromRefString
So, memcpy costs much more cpu cycles after live migration. Then, I restart the
process, this problem disappeared. save-restore has the similar problem.
perf report on vcpu thread in host,
pre live migration:
Performance counter stats for thread id '21082':
0 page-faults
0 minor-faults
0 major-faults
31616 cs
506 migrations
0 alignment-faults
0 emulation-faults
5075957539 L1-dcache-loads
[21.32%]
324685106 L1-dcache-load-misses # 6.40% of all L1-dcache hits
[21.85%]
3681777120 L1-dcache-stores
[21.65%]
65251823 L1-dcache-store-misses # 1.77%
[22.78%]
0 L1-dcache-prefetches
[22.84%]
0 L1-dcache-prefetch-misses
[22.32%]
9321652613 L1-icache-loads
[22.60%]
1353418869 L1-icache-load-misses # 14.52% of all L1-icache hits
[21.92%]
169126969 LLC-loads
[21.87%]
12583605 LLC-load-misses # 7.44% of all LL-cache hits
[ 5.84%]
132853447 LLC-stores
[ 6.61%]
10601171 LLC-store-misses #7.9%
[ 5.01%]
25309497 LLC-prefetches #30%
[ 4.96%]
7723198 LLC-prefetch-misses
[ 6.04%]
4954075817 dTLB-loads
[11.56%]
26753106 dTLB-load-misses # 0.54% of all dTLB cache hits
[16.80%]
3553702874 dTLB-stores
[22.37%]
4720313 dTLB-store-misses #0.13%
[21.46%]
<not counted> dTLB-prefetches
<not counted> dTLB-prefetch-misses
60.000920666 seconds time elapsed
post live migration:
Performance counter stats for thread id '1579':
0 page-faults
[100.00%]
0 minor-faults
[100.00%]
0 major-faults
[100.00%]
34979 cs
[100.00%]
441 migrations
[100.00%]
0 alignment-faults
[100.00%]
0 emulation-faults
6903585501 L1-dcache-loads
[22.06%]
525939560 L1-dcache-load-misses # 7.62% of all L1-dcache hits
[21.97%]
5042552685 L1-dcache-stores
[22.20%]
94493742 L1-dcache-store-misses #1.8%
[22.06%]
0 L1-dcache-prefetches
[22.39%]
0 L1-dcache-prefetch-misses
[22.47%]
13022953030 L1-icache-loads
[22.25%]
1957161101 L1-icache-load-misses # 15.03% of all L1-icache hits
[22.47%]
348479792 LLC-loads
[22.27%]
80662778 LLC-load-misses # 23.15% of all LL-cache hits
[ 5.64%]
198745620 LLC-stores
[ 5.63%]
14236497 LLC-store-misses # 7.1%
[ 5.41%]
20757435 LLC-prefetches
[ 5.42%]
5361819 LLC-prefetch-misses # 25%
[ 5.69%]
7235715124 dTLB-loads
[11.26%]
49895163 dTLB-load-misses # 0.69% of all dTLB cache hits
[16.96%]
5168276218 dTLB-stores
[22.44%]
6765983 dTLB-store-misses #0.13%
[22.24%]
<not counted> dTLB-prefetches
<not counted> dTLB-prefetch-misses
The "LLC-load-misses" went up by about 16%. Then, I restarted the process in
guest, the perf data back to normal,
Performance counter stats for thread id '1579':
0 page-faults
[100.00%]
0 minor-faults
[100.00%]
0 major-faults
[100.00%]
30594 cs
[100.00%]
327 migrations
[100.00%]
0 alignment-faults
[100.00%]
0 emulation-faults
7707091948 L1-dcache-loads
[22.10%]
559829176 L1-dcache-load-misses # 7.26% of all L1-dcache hits
[22.28%]
5976654983 L1-dcache-stores
[23.22%]
160436114 L1-dcache-store-misses
[22.80%]
0 L1-dcache-prefetches
[22.51%]
0 L1-dcache-prefetch-misses
[22.53%]
13798415672 L1-icache-loads
[22.28%]
2017724676 L1-icache-load-misses # 14.62% of all L1-icache hits
[22.49%]
254598008 LLC-loads
[22.86%]
16035378 LLC-load-misses # 6.30% of all LL-cache hits
[ 5.36%]
307019606 LLC-stores
[ 5.60%]
13665033 LLC-store-misses
[ 5.43%]
17715554 LLC-prefetches
[ 5.57%]
4187006 LLC-prefetch-misses
[ 5.44%]
7811502895 dTLB-loads
[10.72%]
40547330 dTLB-load-misses # 0.52% of all dTLB cache hits
[16.31%]
6144202516 dTLB-stores
[21.58%]
6313363 dTLB-store-misses
[21.91%]
<not counted> dTLB-prefetches
<not counted> dTLB-prefetch-misses
60.000812523 seconds time elapsed
If EPT disabled, this problem gone.
I suspect that kvm hypervisor has business with this problem.
Based on above suspect, I want to find the two adjacent versions of kvm-kmod
which triggers this problem or not (e.g. 2.6.39, 3.0-rc1),
and analyze the differences between this two versions, or apply the patches
between this two versions by bisection method, finally find the key patches.
Any better ideas?
Thanks,
Zhang Haoyu
- [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled,
Zhanghaoyu (A) <=
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Michael S. Tsirkin, 2013/07/11
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Gleb Natapov, 2013/07/11
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Xiao Guangrong, 2013/07/11
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Andreas Färber, 2013/07/11
- Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled, Bruce Rogers, 2013/07/11