qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vm performance degradation after kvm live migration or


From: Zhanghaoyu (A)
Subject: Re: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with ETP enabled
Date: Tue, 30 Jul 2013 09:04:56 +0000

>> >> hi all,
>> >> 
>> >> I met similar problem to these, while performing live migration or 
>> >> save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, 
>> >> guest:suse11sp2), running tele-communication software suite in 
>> >> guest, 
>> >> https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html
>> >> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506
>> >> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592
>> >> https://bugzilla.kernel.org/show_bug.cgi?id=58771
>> >> 
>> >> After live migration or virsh restore [savefile], one process's CPU 
>> >> utilization went up by about 30%, resulted in throughput 
>> >> degradation of this process.
>> >> 
>> >> If EPT disabled, this problem gone.
>> >> 
>> >> I suspect that kvm hypervisor has business with this problem.
>> >> Based on above suspect, I want to find the two adjacent versions of 
>> >> kvm-kmod which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), 
>> >> and analyze the differences between this two versions, or apply the 
>> >> patches between this two versions by bisection method, finally find the 
>> >> key patches.
>> >> 
>> >> Any better ideas?
>> >> 
>> >> Thanks,
>> >> Zhang Haoyu
>> >
>> >I've attempted to duplicate this on a number of machines that are as 
>> >similar to yours as I am able to get my hands on, and so far have not been 
>> >able to see any performance degradation. And from what I've read in the 
>> >above links, huge pages do not seem to be part of the problem.
>> >
>> >So, if you are in a position to bisect the kernel changes, that would 
>> >probably be the best avenue to pursue in my opinion.
>> >
>> >Bruce
>> 
>> I found the first bad 
>> commit([612819c3c6e67bac8fceaa7cc402f13b1b63f7e4] KVM: propagate fault r/w 
>> information to gup(), allow read-only memory) which triggers this problem by 
>> git bisecting the kvm kernel (download from 
>> https://git.kernel.org/pub/scm/virt/kvm/kvm.git) changes.
>> 
>> And,
>> git log 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 -n 1 -p > 
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log
>> git diff 
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1..612819c3c6e67bac8fceaa7cc4
>> 02f13b1b63f7e4 > 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff
>> 
>> Then, I diffed 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.log and 
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4.diff,
>> came to a conclusion that all of the differences between 
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4~1 and 
>> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4
>> are contributed by no other than 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4, 
>> so this commit is the peace-breaker which directly or indirectly causes the 
>> degradation.
>> 
>> Does the map_writable flag passed to mmu_set_spte() function have effect on 
>> PTE's PAT flag or increase the VMEXITs induced by that guest tried to write 
>> read-only memory?
>> 
>> Thanks,
>> Zhang Haoyu
>> 
>
>There should be no read-only memory maps backing guest RAM.
>
>Can you confirm map_writable = false is being passed to __direct_map? (this 
>should not happen, for guest RAM).
>And if it is false, please capture the associated GFN.
>
I added below check and printk at the start of __direct_map() at the fist bad 
commit version,
--- kvm-612819c3c6e67bac8fceaa7cc402f13b1b63f7e4/arch/x86/kvm/mmu.c     
2013-07-26 18:44:05.000000000 +0800
+++ kvm-612819/arch/x86/kvm/mmu.c       2013-07-31 00:05:48.000000000 +0800
@@ -2223,6 +2223,9 @@ static int __direct_map(struct kvm_vcpu
        int pt_write = 0;
        gfn_t pseudo_gfn;

+        if (!map_writable)
+                printk(KERN_ERR "%s: %s: gfn = %llu \n", __FILE__, __func__, 
gfn);
+
        for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
                if (iterator.level == level) {
                        unsigned pte_access = ACC_ALL;

I virsh-save the VM, and then virsh-restore it, so many GFNs were printed, you 
can absolutely describe it as flooding.

>Its probably an issue with an older get_user_pages variant (either in kvm-kmod 
>or the older kernel). Is there any indication of a similar issue with upstream 
>kernel?
I will test the upstream kvm 
host(https://git.kernel.org/pub/scm/virt/kvm/kvm.git) later, if the problem is 
still there, 
I will revert the first bad commit patch: 
612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 on the upstream, then test it again.

And, I collected the VMEXITs statistics in pre-save and post-restore period at 
first bad commit version,
pre-save:
COTS-F10S03:~ # perf stat -e "kvm:*" -a sleep 30

 Performance counter stats for 'sleep 30':

           1222318 kvm:kvm_entry
                 0 kvm:kvm_hypercall
                 0 kvm:kvm_hv_hypercall
            351755 kvm:kvm_pio
              6703 kvm:kvm_cpuid
            692502 kvm:kvm_apic
           1234173 kvm:kvm_exit
            223956 kvm:kvm_inj_virq
                 0 kvm:kvm_inj_exception
             16028 kvm:kvm_page_fault
             59872 kvm:kvm_msr
                 0 kvm:kvm_cr
            169596 kvm:kvm_pic_set_irq
             81455 kvm:kvm_apic_ipi
            245103 kvm:kvm_apic_accept_irq
                 0 kvm:kvm_nested_vmrun
                 0 kvm:kvm_nested_intercepts
                 0 kvm:kvm_nested_vmexit
                 0 kvm:kvm_nested_vmexit_inject
                 0 kvm:kvm_nested_intr_vmexit
                 0 kvm:kvm_invlpga
                 0 kvm:kvm_skinit
            853020 kvm:kvm_emulate_insn
            171140 kvm:kvm_set_irq
            171534 kvm:kvm_ioapic_set_irq
                 0 kvm:kvm_msi_set_irq
             99276 kvm:kvm_ack_irq
            971166 kvm:kvm_mmio
             33722 kvm:kvm_fpu
                 0 kvm:kvm_age_page
                 0 kvm:kvm_try_async_get_page
                 0 kvm:kvm_async_pf_not_present
                 0 kvm:kvm_async_pf_ready
                 0 kvm:kvm_async_pf_completed
                 0 kvm:kvm_async_pf_doublefault

      30.019069018 seconds time elapsed

post-restore:
COTS-F10S03:~ # perf stat -e "kvm:*" -a sleep 30

 Performance counter stats for 'sleep 30':

           1327880 kvm:kvm_entry
                 0 kvm:kvm_hypercall
                 0 kvm:kvm_hv_hypercall
            375189 kvm:kvm_pio
              6925 kvm:kvm_cpuid
            804414 kvm:kvm_apic
           1339352 kvm:kvm_exit
            245922 kvm:kvm_inj_virq
                 0 kvm:kvm_inj_exception
             15856 kvm:kvm_page_fault
             39500 kvm:kvm_msr
                 1 kvm:kvm_cr
            179150 kvm:kvm_pic_set_irq
             98436 kvm:kvm_apic_ipi
            247430 kvm:kvm_apic_accept_irq
                 0 kvm:kvm_nested_vmrun
                 0 kvm:kvm_nested_intercepts
                 0 kvm:kvm_nested_vmexit
                 0 kvm:kvm_nested_vmexit_inject
                 0 kvm:kvm_nested_intr_vmexit
                 0 kvm:kvm_invlpga
                 0 kvm:kvm_skinit
            955410 kvm:kvm_emulate_insn
            182240 kvm:kvm_set_irq
            182562 kvm:kvm_ioapic_set_irq
                 0 kvm:kvm_msi_set_irq
            105267 kvm:kvm_ack_irq
           1113999 kvm:kvm_mmio
             37789 kvm:kvm_fpu
                 0 kvm:kvm_age_page
                 0 kvm:kvm_try_async_get_page
                 0 kvm:kvm_async_pf_not_present
                 0 kvm:kvm_async_pf_ready
                 0 kvm:kvm_async_pf_completed
                 0 kvm:kvm_async_pf_doublefault

      30.000779718 seconds time elapsed

Thanks,
Zhang Haoyu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]