[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage

From: Michael R. Hines
Subject: Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage
Date: Wed, 12 Oct 2016 16:18:39 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0


Greetings from DigitalOcean. We're experiencing the same symptoms without this patch. We have, collectively, many gigabytes of un-planned-for RSS being used per-hypervisor
that we would like to get rid of =).

Without explicitly trying this patch (will do that ASAP), we immediately noticed that the 192MB mentioned immediately melts away (Yay) when we disabled the coroutine thread pool explicitly, with another ~100MB in additional stack usage that would likely also go away if we
applied the entirety of your patch.

Is there any chance you have revisited this or have a timeline for it?

- Michael

 * Michael R. Hines
 * Senior Engineer, DigitalOcean.

On 06/28/2016 04:01 AM, Peter Lieven wrote:
I recently found that Qemu is using several hundred megabytes of RSS memory
more than older versions such as Qemu 2.2.0. So I started tracing
memory allocation and found 2 major reasons for this.

1) We changed the qemu coroutine pool to have a per thread and a global release
    pool. The choosen poolsize and the changed algorithm could lead to up to
    192 free coroutines with just a single iothread. Each of the coroutines
    in the pool each having 1MB of stack memory.

2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed freeing
    of memory. This lead to higher heap allocations which could not effectively
    be returned to kernel (most likely due to fragmentation).

The following series is what I came up with. Beside the coroutine patches I 
some allocations to forcibly use mmap. All these allocations are not repeatly 
during runtime so the impact of using mmap should be neglectible.

There are still some big malloced allocations left which cannot be easily 
(e.g. the pixman buffers in VNC). So it might an idea to set a lower mmap 
threshold for
malloc since this threshold seems to be in the order of several Megabytes on 
modern systems.

Peter Lieven (15):
   coroutine-ucontext: mmap stack memory
   coroutine-ucontext: add a switch to monitor maximum stack size
   coroutine-ucontext: reduce stack size to 64kB
   coroutine: add a knob to disable the shared release pool
   util: add a helper to mmap private anonymous memory
   exec: use mmap for subpages
   qapi: use mmap for QmpInputVisitor
   virtio: use mmap for VirtQueue
   loader: use mmap for ROMs
   vmware_svga: use mmap for scratch pad
   qom: use mmap for bigger Objects
   util: add a function to realloc mmapped memory
   exec: use mmap for PhysPageMap->nodes
   vnc-tight: make the encoding palette static
   vnc: use mmap for VncState

  configure                 | 33 ++++++++++++++++++--
  exec.c                    | 11 ++++---
  hw/core/loader.c          | 16 +++++-----
  hw/display/vmware_vga.c   |  3 +-
  hw/virtio/virtio.c        |  5 +--
  include/qemu/mmap-alloc.h |  7 +++++
  include/qom/object.h      |  1 +
  qapi/qmp-input-visitor.c  |  5 +--
  qom/object.c              | 20 ++++++++++--
  ui/vnc-enc-tight.c        | 21 ++++++-------
  ui/vnc.c                  |  5 +--
  ui/vnc.h                  |  1 +
  util/coroutine-ucontext.c | 66 +++++++++++++++++++++++++++++++++++++--
  util/mmap-alloc.c         | 27 ++++++++++++++++
  util/qemu-coroutine.c     | 79 ++++++++++++++++++++++++++---------------------
  15 files changed, 225 insertions(+), 75 deletions(-)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]