qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] memory: memory_region_transaction_commit() slow


From: Avi Kivity
Subject: Re: [Qemu-devel] memory: memory_region_transaction_commit() slow
Date: Sun, 29 Jun 2014 09:56:19 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0


On 06/26/2014 05:31 PM, Etienne Martineau wrote:
On 14-06-26 04:18 AM, Avi Kivity wrote:
On 06/25/2014 08:53 PM, Etienne Martineau wrote:
Hi,

It seems to me that there is a scale issue O(n) in 
memory_region_transaction_commit().
It's actually O(n^3).  Flatview is kept sorted but is just a vector, so if you 
insert n regions, you have n^2 operations. In addition every PCI device has an 
address space, so we get n^3 (technically the third n is different from the 
first two, but they are related).

The first problem can be solved by implementing Flatview with an std::set<> or 
equivalent, the second by memoization - most pci address spaces are equal (they only 
differ based on whether bus mastering is enabled or not), so a clever cache can 
reduce the effort to generate them.

However I'm not at all sure that the problem is cpu time in qemu. It could be 
due to rcu_synchronize delays when the new memory maps are fed to kvm and vfio. 
 I recommend trying to isolate exactly where the time is spent.

It's seem like the linear increase in CPU time comes from QEMU ( at least from 
my measurements below)

In those code paths QEMU calls back into KVM (KVM_SET_MEMORY_REGION) and vfio. So it would be good to understand exactly where the time is spent. I doubt it's computation (which is O(n^3), but very fast), instead it's likely waiting for something.


In QEMU kvm_cpu_exec() I've added a hook that measure the time that is spent 
outside 'kvm_vcpu_ioctl(cpu, KVM_RUN, 0)'
 From the logs below this is "QEMU long exit vCPU n x(msec) exit_reason'

Similarly in KVM vcpu_enter_guest() I've added a new ftrace that measure the time 
spent outside 'kvm_x86_ops->run(vcpu)'
 From the logs below this is "kvm_long_exit: x(msec)'. Please note that this is 
a trimmed down view of the real ftrace output.

Also please note that the above hacks are useful ( at least to me since I 
haven't figured out a better way to do the same with existing ftrace ) to 
measure the RTT at both QEMU and KVM level.

The time spent outside KVM 'kvm_x86_ops->run(vcpu)' will always be greater than 
the time spent outside QEMU 'kvm_vcpu_ioctl(cpu, KVM_RUN, 0)' for a given vCPU. Now
the difference between the time spent outside KVM to the time spend outside 
QEMU ( for a given vCPU ) tells us who is burning cycle ( QEMU or KVM ) and how 
much ( in msec )

In the below experiment I've put side by side the QEMU and the KVM RTT time. We 
can see that the time to assign device ( same BAR size for all devices ) 
increase
linearly ( like previously reported ). Also from the RTT measurement both QEMU 
and KVM are mostly within the same range suggesting that the increase comes 
from QEMU and not KVM.

The one exception is that for every device assign there is a KVM operation that 
seems to be taking ~100msec each time. Since this is O(1) I'm not too concerned.


device assign #1:
    device_add pci-assign,host=28:10.2,bus=pciehp.3.8
kvm_long_exit: 100
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 20 2    kvm_long_exit: 20
    QEMU long exit vCPU 0 20 2    kvm_long_exit: 20
    QEMU long exit vCPU 0 20 2    kvm_long_exit: 20
    QEMU long exit vCPU 0 19 2    kvm_long_exit: 19
    QEMU long exit vCPU 0 19 2    kvm_long_exit: 19
    QEMU long exit vCPU 0 19 2    kvm_long_exit: 20
    QEMU long exit vCPU 0 19 2    kvm_long_exit: 19
    QEMU long exit vCPU 0 19 2    kvm_long_exit: 19
    QEMU long exit vCPU 0 19 2    kvm_long_exit: 19
    QEMU long exit vCPU 0 19 2    kvm_long_exit: 20
    QEMU long exit vCPU 0 42 2    kvm_long_exit: 42
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21

device assign #2:
    device_add pci-assign,host=28:10.3,bus=pciehp.3.9
kvm_long_exit: 101
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 21 2    kvm_long_exit: 21
    QEMU long exit vCPU 0 45 2    kvm_long_exit: 45
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23

device assign #3:
    device_add pci-assign,host=28:10.4,bus=pciehp.3.10
kvm_long_exit: 100
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
    QEMU long exit vCPU 0 48 2    kvm_long_exit: 48
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25

device assign #4:
    device_add pci-assign,host=28:10.5,bus=pciehp.3.11
kvm_long_exit: 100
    QEMU long exit vCPU 0 27 2    kvm_long_exit: 27
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 24 2    kvm_long_exit: 24
    QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 24 2    kvm_long_exit: 24
    QEMU long exit vCPU 0 24 2    kvm_long_exit: 25
    QEMU long exit vCPU 0 52 2    kvm_long_exit: 52
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26

device assign #5:
    device_add pci-assign,host=28:10.6,bus=pciehp.3.12
kvm_long_exit: 100
    QEMU long exit vCPU 0 28 2    kvm_long_exit: 28
    QEMU long exit vCPU 0 27 2    kvm_long_exit: 27
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 27 2    kvm_long_exit: 27
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
    QEMU long exit vCPU 0 55 2    kvm_long_exit: 56
    QEMU long exit vCPU 0 28 2    kvm_long_exit: 28

thanks,
Etienne





reply via email to

[Prev in Thread] Current Thread [Next in Thread]