qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] util: import GTree as QTree


From: Daniel P . Berrangé
Subject: Re: [PATCH 1/2] util: import GTree as QTree
Date: Wed, 11 Jan 2023 12:08:26 +0000
User-agent: Mutt/2.2.9 (2022-11-12)

On Tue, Jan 10, 2023 at 10:55:35PM -0500, Emilio Cota wrote:
> The only reason to add this tree is to control the memory allocator
> used. Some users (e.g. TCG) cannot work reliably in multi-threaded
> environments (e.g. forking in user-mode) with GTree's allocator, GSlice.
> See https://gitlab.com/qemu-project/qemu/-/issues/285 for details.
> 
> Importing GTree is a temporary workaround until GTree migrates away
> from GSlice.
> 
> This implementation is identical to that in glib v2.75.0.
> I've imported tests from glib and added a benchmark just to
> make sure that performance is similar (Note: it cannot be identical
> because we are not using GSlice).
> 
> $ taskset -c 2 tests/bench/qtree-bench
> 
> - With libc's allocator:
> 
>  Tree         Op      32            1024            4096          131072      
>    1048576
> ------------------------------------------------------------------------------------------------
> GTree     Lookup   14.01           15.17           24.93           18.99      
>      15.28
> QTree     Lookup   22.50 (1.61x)   32.49 (2.14x)   29.84 (1.20x)   16.77 
> (0.88x)   12.21 (0.80x)
> GTree     Insert   19.24           15.72           25.24           17.87      
>      16.55
> QTree     Insert   15.07 (0.78x)   26.70 (1.70x)   25.68 (1.02x)   17.20 
> (0.96x)   12.49 (0.75x)
> GTree     Remove   11.57           31.44           29.77           20.88      
>      16.60
> QTree     Remove   14.01 (1.21x)   34.54 (1.10x)   33.52 (1.13x)   26.64 
> (1.28x)   14.95 (0.90x)
> GTree  RemoveAll   57.97          119.13          118.16          112.82      
>      61.63
> QTree  RemoveAll   46.31 (0.80x)  108.04 (0.91x)  113.85 (0.96x)   77.88 
> (0.69x)   41.69 (0.68x)
> GTree   Traverse   72.56          232.83          243.20          254.22      
>      97.44
> QTree   Traverse   66.53 (0.92x)  394.76 (1.70x)  357.07 (1.47x)  289.09 
> (1.14x)   45.64 (0.47x)
> ------------------------------------------------------------------------------------------------

Well this is rather strange, as it doesn't really match what I
see when running your test benchmark !

First, I find the test to be a little unreliable the first few
times it is ran. I ran it in a loop 20 times and it got more
stable results. Looking at just the QTree lines I get something
typically like:

QTree     Lookup   21.43 (1.33x)   17.99 (1.03x)   16.71 (1.07x)   10.01 
(0.75x)    4.51 (0.40x) 
QTree     Insert   12.65 (0.81x)   12.65 (0.76x)   11.94 (0.76x)    7.71 
(0.59x)    4.30 (0.39x) 
QTree     Remove   12.77 (1.09x)   18.34 (1.09x)   17.68 (1.07x)   13.65 
(1.00x)    8.65 (0.76x) 
QTree  RemoveAll   35.05 (1.01x)   40.17 (1.10x)   30.70 (0.88x)   42.06 
(1.25x)   27.13 (1.14x) 
QTree   Traverse   72.40 (1.12x)  180.95 (1.24x)  138.17 (1.09x)  146.29 
(1.21x)   51.62 (1.29x) 

So it is slower on small Lookup, and slower on Traverse. On large Lookup
and Insert malloc is massively faster.

One thing to bear in mind is that if setting G_SLICE=always-malloc, we
should in theory see the exact same results for GTree and QTree. So for
a sanity check, I tried the test with that env set and get:

QTree     Lookup   21.72 (1.31x)   19.04 (1.05x)   16.65 (1.01x)    9.94 
(1.06x)    7.19 (1.06x) 
QTree     Insert   14.71 (1.25x)   12.59 (1.07x)   11.83 (1.04x)    7.48 
(0.99x)    5.72 (0.96x) 
QTree     Remove   12.48 (1.02x)   18.58 (1.01x)   17.89 (1.01x)   11.68 
(0.97x)    8.96 (1.11x) 
QTree  RemoveAll   31.47 (1.04x)   39.71 (1.16x)   37.84 (1.13x)   37.11 
(1.15x)   24.56 (1.04x) 
QTree   Traverse   74.77 (1.47x)  179.21 (1.28x)  164.88 (1.15x)  126.17 
(1.07x)   42.18 (1.00x) 

That's odd - all values ought to be 1.00 or very close.

This tells me that the nelements=32 data is not to be trusted.

There's also something wierd going on with Traverse - it is
always slower in QTree, even when both QTree and GTree are
using malloc. There must be some wierd cache effects from
the locally linked vs .so lib executed calls IMHO.

So overall if I ignore the unreliable results, my take away is
that malloc is pretty much always a win over gslice, sometimes
massively so, but at least shouldn't be worse.

NB, I'm using Fedora 37 with glibc.  Mileage may vary with different
libc impls.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]