qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items


From: Emilio G. Cota
Subject: Re: [Qemu-devel] [RFC PATCH 2/3] cpus-common: Cache allocated work items
Date: Mon, 28 Aug 2017 15:05:01 -0400
User-agent: Mutt/1.5.24 (2015-08-30)

On Sun, Aug 27, 2017 at 23:53:25 -0400, Pranith Kumar wrote:
> Using heaptrack, I found that quite a few of our temporary allocations
> are coming from allocating work items. Instead of doing this
> continously, we can cache the allocated items and reuse them instead
> of freeing them.
> 
> This reduces the number of allocations by 25% (200000 -> 150000 for
> ARM64 boot+shutdown test).
> 

But what is the perf difference, if any?

Adding a lock (or a cmpxchg) here is not a great idea. However, this is not yet
immediately obvious because of other scalability bottlenecks. (if
you boot many arm64 cores you'll see most of the time is spent idling
on the BQL, see
  https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg05207.html )

You're most likely better off using glib's slices, see
  https://developer.gnome.org/glib/stable/glib-Memory-Slices.html
These slices use per-thread lists, so scalability should be OK.

I also suggest profiling with either or both of jemalloc/tcmalloc
(build with --enable-jemalloc/tcmalloc) in addition to using glibc's
allocator, and then based on perf numbers decide whether this is something
worth optimizing.

Thanks,

                Emilio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]