[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Qemu and 32 PCIe devices
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] Qemu and 32 PCIe devices |
Date: |
Wed, 9 Aug 2017 12:16:32 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 |
On 09/08/2017 12:00, Laszlo Ersek wrote:
> On 08/09/17 09:26, Paolo Bonzini wrote:
>> On 09/08/2017 03:06, Laszlo Ersek wrote:
>>>> 20.14% qemu-system-x86_64 [.] render_memory_region
>>>> 17.14% qemu-system-x86_64 [.] subpage_register
>>>> 10.31% qemu-system-x86_64 [.] int128_add
>>>> 7.86% qemu-system-x86_64 [.] addrrange_end
>>>> 7.30% qemu-system-x86_64 [.] int128_ge
>>>> 4.89% qemu-system-x86_64 [.] int128_nz
>>>> 3.94% qemu-system-x86_64 [.] phys_page_compact
>>>> 2.73% qemu-system-x86_64 [.] phys_map_node_alloc
>>
>> Yes, this is the O(n^3) thing. An optimized build should be faster
>> because int128 operations will be inlined and become much more efficient.
>>
>>> With this patch, I only tested the "93 devices" case, as the slowdown
>>> became visible to the naked eye from the trace messages, as the firmware
>>> enabled more and more BARs / command registers (and inversely, the
>>> speedup was perceivable when the firmware disabled more and more BARs /
>>> command registers).
>>
>> This is an interesting observation, and it's expected. Looking at the
>> O(n^3) complexity more in detail you have N operations, where the "i"th
>> operates on "i" DMA address spaces, all of which have at least "i"
>> memory regions (at least 1 BAR per device).
>
> - Can you please give me a pointer to the code where the "i"th operation
> works on "i" DMA address spaces? (Not that I dream about patching *that*
> code, wherever it may live :) )
It's all driven by actions of the guest.
Simply, by the time you get to the "i"th command register, you have
enabled bus-master DMA on "i" devices (so that "i" DMA address spaces
are non-empty) and you have enabled BARs on "i" devices (so that their
BARs are included in the address spaces).
> - You mentioned that changing this is on the ToDo list. I couldn't find
> it under <https://wiki.qemu.org/index.php/ToDo>. Is it tracked somewhere
> else?
I've added it to https://wiki.qemu.org/index.php/ToDo/MemoryAPI (thanks
for the nudge).
Paolo
> (I'm not trying to urge any changes in the area, I'd just like to learn
> about the code & the tracker item, if there's one.)
>
> Thanks!
> Laszlo
>
>>
>> So the total cost is sum i=1..N i^2 = N(N+1)(2N+1)/6 = O(n^3).
>> Expressing it as a sum shows why it gets slower as time progresses.
>>
>> The solution is to note that those "i" address spaces are actually all
>> the same, so we can get it down to sum i=1..N i = N(N+1)/2 = O(n^2).
>>
>> Thanks,
>>
>> Paolo
>>
>
- [Qemu-devel] Qemu and 32 PCIe devices, Marcin Juszkiewicz, 2017/08/08
- Re: [Qemu-devel] Qemu and 32 PCIe devices, Laszlo Ersek, 2017/08/08
- Re: [Qemu-devel] Qemu and 32 PCIe devices, Laszlo Ersek, 2017/08/08
- Re: [Qemu-devel] Qemu and 32 PCIe devices, Paolo Bonzini, 2017/08/09
- Re: [Qemu-devel] Qemu and 32 PCIe devices, Laszlo Ersek, 2017/08/09
- Re: [Qemu-devel] Qemu and 32 PCIe devices,
Paolo Bonzini <=
- Re: [Qemu-devel] Qemu and 32 PCIe devices, Laszlo Ersek, 2017/08/09
- Re: [Qemu-devel] Qemu and 32 PCIe devices, Peter Maydell, 2017/08/09
- Re: [Qemu-devel] Qemu and 32 PCIe devices, Paolo Bonzini, 2017/08/09
- Re: [Qemu-devel] Qemu and 32 PCIe devices, Michael S. Tsirkin, 2017/08/09