qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path


From: Peter Feiner
Subject: [Qemu-devel] [PATCH 0/1] Exogenous memory management via -mem-path
Date: Tue, 20 Nov 2012 07:10:23 -0500

This patch proposes a framework that empowers QEMU to implement powerful memory
management techniques with very low cost. The insight is to leverage existing
sub-systems, composing functionality in a traditional UNIX style. The key to
enabling this approach is to expand the -mem-path option to provide more control
over how backing memory files are named, opened, and mapped.

Two seemingly complex memory management techniques can now be achieved quite
simply:

    1. Post-copy migration.

        A. Boot a VM with RAM backed by MAP_SHARED files.
        B. Pause the VM and save device state.
        C. Share RAM files using a network file system.
        D. Copy saved device state to target host.
        E. On the target host, create a VM with the saved device state. Back RAM
           with files on the network mount.

        As the VM runs on the target, pages are loaded on demand by the
        network file system.

    2. VM cloning with shared CoW memory

        A. Boot a parent VM, run for a while, then pause.
        B. Save the parent's device state.
        C. Create child VMs from the saved state. Back RAM with MAP_PRIVATE
           overlays of the parent's RAM.

        Pages are shared by the children until they're COW'd.

        Note that kernel shared memory (KSM, or TPS - transparent page sharing -
        in VMWare parlance) is complimentary to cloning. KSM discovers pages
        that happen to be identical and shares them. Although KSM can find all
        redundant pages, doing so comes at a cost. Cloning, on the other hand,
        exploits sharing opportunities that it knows about for free. However, as
        children are longer lived and COW more pages, the memory savings due to
        cloning diminish.

By simply granting more control over the manipulation of backing RAM files, QEMU
can enable these features *with no additional costs or changes to its internal
memory management*.

Thanks to MMU notifiers, kvm is already compatible with all file systems and
MAP_SHARED and MAP_PRIVATE. Furthermore, thanks to -mem-path, QEMU is already
wired to back RAMBlocks with files. However, since these backing files are
currently randomly named (i.e., using mkstemp), it's impossible to know the
correspondence between different VMs' backing files.

This patch's essential feature is making the names of the RAMBlocks' backing
files deterministic and unique. Then, given predictable backing file names, this
patch makes QEMU configurable to use existing RAMBlock backing files. These new
features are exposed to the user via arguments to the -mem-path command:

-mem-path \
    path[,mode=temp|open|create][,mmap=private|shared][,nofallback][,anyfs]

A detailed explanation of each argument follows.

mode:

    Naming of the RAMBlock filenames is controlled by the new mode argument for
    -mem-path. When mode is "create", files with deterministic names are created
    and opened. When mode is "temp", which is the default, the names are random
    and the files are unlinked, as per the current behavior of -mem-path. The
    "open" mode is the same as "create" but it fails if the files don't already
    exist; the "open" mode is included as a sanity checking measure when
    creating children.

    To make backing files' names' deterministic and unique, a RAMBlock's offset
    is included in its backing file's name. The offset is a function of the
    order in which RAMBlocks are allocated and their size, which, in turn, is a
    function of the order in which devices are initialized. Hence, given the
    same QEMU command line, two QEMU processes will have RAMBlocks with the same
    offsets. Note that Xen's live migration also relies on RAMBlocks having the
    same offsets between QEMU processes (see xen_read_physmap).

    In addition to the offset, the size of the RamBlock and the (non-unique)
    name of the RamBlock's MemoryRegion included in the file name:

    qemu_back_mem.OFFSET+SIZE.NAME[.RANDOM]

    The size and name are included for extra sanity checking when the -mem-path
    mode is "open". The optional random suffix is added when the mode is "temp".

mmap:

    Whether the RAMBlock files are mmap'd with MAP_PRIVATE or MAP_SHARED is
    governed by the mmap argument. As explained above, MAP_PRIVATE is necessary
    for sharing memory between children. The motivation for MAP_SHARED is to
    make saving the parent's memory state a zero-cost operation during step 2B
    of the clone algorithm by using the parent's RAMBlock files in place for the
    children. This is best illustrated by an example:

    $ mkdir /tmp/mem
    $ qemu -mempath /tmp/mem,mode=create,mmap=shared -m 8g -qmp stdio disk.qcow2
    # Warm up the parent (e.g., fill buffer cache, load apps) ...

    # In the parent's qmp shell:
    {"execute": "qmp_capabilities"}
    {"execute": "stop"}
    {"execute": "xen-save-devices-state", "arguments": {"filename": "devices"}}
    {"execute": "quit"}

    # Now we have a tiny file with the parent's device state and
    # and a bunch of memory backing files:
    $ ls -lh devices
    ...  80K ... devices
    $ ls -lh /tmp/mem
    ... 8.0G ... qemu_back_mem.0+200000000.pc.ram
    ... 128K ... qemu_back_mem.200000000+20000.pc.bios
    ... 128K ... qemu_back_mem.200020000+20000.pc.rom
    ... 8.0M ... qemu_back_mem.200040000+800000.vga.vram
    ...  64K ... qemu_back_mem.200840000+10000.cirrus_vga.rom
    ... 128K ... qemu_back_mem.200850000+20000.e1000.rom

    # Launch the children (mode=open, mmap=private):
    $ for i in $(seq 100); do \
        qemu-img create -o backing_file=disk.qcow2 -f qcow2 child$i.qcow2; \
        qemu -mem-path /tmp/memory,mode=open,mmap=private \
             -incoming "exec:cat devices" \
             -m 8g \
             child$i.qcow2 &; \
      done
    # Now we have 100 8GB VMs sharing memory!

nofallback:

    Abort if a RAMBlock can't be backed by a file. Currently, QEMU falls back to
    anonymous VMAs. This fallback only makes sense if you're using -mem-path as
    a transparent optimization (as is the case with hugetlbfs).

anyfs:
    
    -mem-path currently prints a warning if the specified path isn't on a
    hugetlbfs mount. This option squelches that warning.

In conclusion, by constraining -mem-path's implementation with deterministic
file names and control over mmap flags, QEMU enables efficient VM cloning.
Moreover, novel memory-management techniques can be implemented without adding
any complexity to QEMU. For example, the example becomes post-copy live
migration if /tmp/mem is shared using NFS and a child is launched on another
host.

Peter Feiner (1):
  exec: make -mem-path filenames deterministic

 cpu-all.h       |    5 ++++
 exec.c          |   60 ++++++++++++++++++++++++++++++++++--------------------
 qemu-config.c   |   26 +++++++++++++++++++++++
 qemu-options.hx |   24 +++++++++++++++++++--
 vl.c            |   43 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 131 insertions(+), 27 deletions(-)

-- 
1.7.5.4




reply via email to

[Prev in Thread] Current Thread [Next in Thread]