qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] memory: memory_region_transaction_commit() slow


From: Etienne Martineau
Subject: Re: [Qemu-devel] memory: memory_region_transaction_commit() slow
Date: Thu, 26 Jun 2014 10:31:59 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130330 Thunderbird/17.0.5

On 14-06-26 04:18 AM, Avi Kivity wrote:
> 
> On 06/25/2014 08:53 PM, Etienne Martineau wrote:
>> Hi,
>>
>> It seems to me that there is a scale issue O(n) in 
>> memory_region_transaction_commit().
> 
> It's actually O(n^3).  Flatview is kept sorted but is just a vector, so if 
> you insert n regions, you have n^2 operations. In addition every PCI device 
> has an address space, so we get n^3 (technically the third n is different 
> from the first two, but they are related).
> 
> The first problem can be solved by implementing Flatview with an std::set<> 
> or equivalent, the second by memoization - most pci address spaces are equal 
> (they only differ based on whether bus mastering is enabled or not), so a 
> clever cache can reduce the effort to generate them.
> 
> However I'm not at all sure that the problem is cpu time in qemu. It could be 
> due to rcu_synchronize delays when the new memory maps are fed to kvm and 
> vfio.  I recommend trying to isolate exactly where the time is spent.
> 

It's seem like the linear increase in CPU time comes from QEMU ( at least from 
my measurements below)

In QEMU kvm_cpu_exec() I've added a hook that measure the time that is spent 
outside 'kvm_vcpu_ioctl(cpu, KVM_RUN, 0)'
>From the logs below this is "QEMU long exit vCPU n x(msec) exit_reason'

Similarly in KVM vcpu_enter_guest() I've added a new ftrace that measure the 
time spent outside 'kvm_x86_ops->run(vcpu)'
>From the logs below this is "kvm_long_exit: x(msec)'. Please note that this is 
>a trimmed down view of the real ftrace output.

Also please note that the above hacks are useful ( at least to me since I 
haven't figured out a better way to do the same with existing ftrace ) to 
measure the RTT at both QEMU and KVM level.

The time spent outside KVM 'kvm_x86_ops->run(vcpu)' will always be greater than 
the time spent outside QEMU 'kvm_vcpu_ioctl(cpu, KVM_RUN, 0)' for a given vCPU. 
Now 
the difference between the time spent outside KVM to the time spend outside 
QEMU ( for a given vCPU ) tells us who is burning cycle ( QEMU or KVM ) and how 
much ( in msec )

In the below experiment I've put side by side the QEMU and the KVM RTT time. We 
can see that the time to assign device ( same BAR size for all devices ) 
increase
linearly ( like previously reported ). Also from the RTT measurement both QEMU 
and KVM are mostly within the same range suggesting that the increase comes 
from QEMU and not KVM.

The one exception is that for every device assign there is a KVM operation that 
seems to be taking ~100msec each time. Since this is O(1) I'm not too concerned.


device assign #1:
   device_add pci-assign,host=28:10.2,bus=pciehp.3.8
                                       
                                 kvm_long_exit: 100 
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 26 
   QEMU long exit vCPU 0 20 2    kvm_long_exit: 20 
   QEMU long exit vCPU 0 20 2    kvm_long_exit: 20 
   QEMU long exit vCPU 0 20 2    kvm_long_exit: 20 
   QEMU long exit vCPU 0 19 2    kvm_long_exit: 19 
   QEMU long exit vCPU 0 19 2    kvm_long_exit: 19 
   QEMU long exit vCPU 0 19 2    kvm_long_exit: 20 
   QEMU long exit vCPU 0 19 2    kvm_long_exit: 19 
   QEMU long exit vCPU 0 19 2    kvm_long_exit: 19 
   QEMU long exit vCPU 0 19 2    kvm_long_exit: 19 
   QEMU long exit vCPU 0 19 2    kvm_long_exit: 20 
   QEMU long exit vCPU 0 42 2    kvm_long_exit: 42 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 

device assign #2:
   device_add pci-assign,host=28:10.3,bus=pciehp.3.9
   
                                 kvm_long_exit: 101     
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 21 2    kvm_long_exit: 21 
   QEMU long exit vCPU 0 45 2    kvm_long_exit: 45 
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23 

device assign #3:
   device_add pci-assign,host=28:10.4,bus=pciehp.3.10
   
                                 kvm_long_exit: 100 
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 23 2    kvm_long_exit: 23
   QEMU long exit vCPU 0 48 2    kvm_long_exit: 48
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25

device assign #4:
   device_add pci-assign,host=28:10.5,bus=pciehp.3.11
   
                                 kvm_long_exit: 100                
   QEMU long exit vCPU 0 27 2    kvm_long_exit: 27
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 24 2    kvm_long_exit: 24
   QEMU long exit vCPU 0 25 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 24 2    kvm_long_exit: 24
   QEMU long exit vCPU 0 24 2    kvm_long_exit: 25
   QEMU long exit vCPU 0 52 2    kvm_long_exit: 52
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26

device assign #5:
   device_add pci-assign,host=28:10.6,bus=pciehp.3.12
   
                                 kvm_long_exit: 100                
   QEMU long exit vCPU 0 28 2    kvm_long_exit: 28
   QEMU long exit vCPU 0 27 2    kvm_long_exit: 27
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
   QEMU long exit vCPU 0 27 2    kvm_long_exit: 27
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
   QEMU long exit vCPU 0 26 2    kvm_long_exit: 26
   QEMU long exit vCPU 0 55 2    kvm_long_exit: 56
   QEMU long exit vCPU 0 28 2    kvm_long_exit: 28

thanks,
Etienne




reply via email to

[Prev in Thread] Current Thread [Next in Thread]