qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux


From: Claudio Imbrenda
Subject: Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
Date: Thu, 11 Aug 2022 14:03:25 +0200

On Wed, 10 Aug 2022 17:30:41 -0300
Murilo Opsfelder Araújo <muriloo@linux.ibm.com> wrote:

> Hi, Claudio.

Hi Murilo,

[...]
 
> I've smoke-tested this on ppc and everything looks fine.
> For what's worth:
> 
> Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Tested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

thanks a lot for testing this!

> 
> 
> Have you measured the benefits of using -async-teardown vs. not using it?
> If so, can you please share the details so I can give it a try on ppc, too?
> 
> The wall-clock perception is that nothing has changed, for better or worse.
> My tests used mid-sized VMs, like 128 vCPUs, 64GB RAM.

The number of CPUs doesn't really have any impact. 64G of RAM is quite
small to notice a sizeable difference, although you should be able to
see a few seconds of speedup in the shutdown. Also, starting a guest
with a lot of RAM is not enough, you have to make sure that the guest
ram is actually allocated (completely fill the ram in the guest before
shutting it down)

I just tested a 64G and a 256G guest on s390x. I measured the time
between the last line in the console ("Reached target Power-Off.") and
the moment when control comes back to the shell. The measurement was
not exactly super accurate (I manually ran "date +%s" in another shell
when I saw the last line in the console, and then again when I got the
shell back from qemu). 

The 64G guest needs a few seconds, the 256G needs almost exactly 4
times as much. With the asynchronous teardown it's almost instant in
both cases (less than 1s, too fast to measure manually).

Try a multi-TB guest if you can (at the moment I can't) to
see a more marked effect.

Also consider that this is for _normal_ guests. Protected guests on
s390x have an even slower teardown due to the way protected
virtualization is implemented in the hardware.

I hope this was helpful

> 
> Cheers!
> 
> > ---
> >   include/qemu/async-teardown.h |  22 ++++++
> >   os-posix.c                    |   6 ++
> >   qemu-options.hx               |  17 +++++
> >   util/async-teardown.c         | 123 ++++++++++++++++++++++++++++++++++
> >   util/meson.build              |   1 +
> >   5 files changed, 169 insertions(+)
> >   create mode 100644 include/qemu/async-teardown.h
> >   create mode 100644 util/async-teardown.c
> > 
> > diff --git a/include/qemu/async-teardown.h b/include/qemu/async-teardown.h
> > new file mode 100644
> > index 0000000000..092e7a37e7
> > --- /dev/null
> > +++ b/include/qemu/async-teardown.h
> > @@ -0,0 +1,22 @@
> > +/*
> > + * Asynchronous teardown
> > + *
> > + * Copyright IBM, Corp. 2022
> > + *
> > + * Authors:
> > + *  Claudio Imbrenda <imbrenda@linux.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at 
> > your
> > + * option) any later version.  See the COPYING file in the top-level 
> > directory.
> > + *
> > + */
> > +#ifndef QEMU_ASYNC_TEARDOWN_H
> > +#define QEMU_ASYNC_TEARDOWN_H
> > +
> > +#include "config-host.h"
> > +
> > +#ifdef CONFIG_LINUX
> > +void init_async_teardown(void);
> > +#endif
> > +
> > +#endif
> > diff --git a/os-posix.c b/os-posix.c
> > index 321fc4bd13..4858650c3e 100644
> > --- a/os-posix.c
> > +++ b/os-posix.c
> > @@ -39,6 +39,7 @@
> >   
> >   #ifdef CONFIG_LINUX
> >   #include <sys/prctl.h>
> > +#include "qemu/async-teardown.h"
> >   #endif
> >   
> >   /*
> > @@ -150,6 +151,11 @@ int os_parse_cmd_args(int index, const char *optarg)
> >       case QEMU_OPTION_daemonize:
> >           daemonize = 1;
> >           break;
> > +#if defined(CONFIG_LINUX)
> > +    case QEMU_OPTION_asyncteardown:
> > +        init_async_teardown();
> > +        break;
> > +#endif
> >       default:
> >           return -1;
> >       }
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 3f23a42fa8..d434353159 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -4743,6 +4743,23 @@ HXCOMM Internal use
> >   DEF("qtest", HAS_ARG, QEMU_OPTION_qtest, "", QEMU_ARCH_ALL)
> >   DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL)
> >   
> > +#ifdef __linux__
> > +DEF("async-teardown", 0, QEMU_OPTION_asyncteardown,
> > +    "-async-teardown enable asynchronous teardown\n",
> > +    QEMU_ARCH_ALL)
> > +#endif
> > +SRST
> > +``-async-teardown``
> > +    Enable asynchronous teardown. A new teardown process will be
> > +    created at startup, using clone. The teardown process will share
> > +    the address space of the main qemu process, and wait for the main
> > +    process to terminate. At that point, the teardown process will
> > +    also exit. This allows qemu to terminate quickly if the guest was
> > +    huge, leaving the teardown of the address space to the teardown
> > +    process. Since the teardown process shares the same cgroups as the
> > +    main qemu process, accounting is performed correctly.
> > +ERST
> > +
> >   DEF("msg", HAS_ARG, QEMU_OPTION_msg,
> >       "-msg [timestamp[=on|off]][,guest-name=[on|off]]\n"
> >       "                control error message format\n"
> > diff --git a/util/async-teardown.c b/util/async-teardown.c
> > new file mode 100644
> > index 0000000000..07fe549891
> > --- /dev/null
> > +++ b/util/async-teardown.c
> > @@ -0,0 +1,123 @@
> > +/*
> > + * Asynchronous teardown
> > + *
> > + * Copyright IBM, Corp. 2022
> > + *
> > + * Authors:
> > + *  Claudio Imbrenda <imbrenda@linux.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at 
> > your
> > + * option) any later version.  See the COPYING file in the top-level 
> > directory.
> > + *
> > + */
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +#include <sys/types.h>
> > +#include <sys/unistd.h>
> > +#include <dirent.h>
> > +#include <sys/prctl.h>
> > +#include <signal.h>
> > +#include <sched.h>
> > +
> > +#include "qemu/async-teardown.h"
> > +
> > +static pid_t the_ppid;
> > +
> > +/*
> > + * Close all open file descriptors.
> > + */
> > +static void close_all_open_fd(void)
> > +{
> > +    struct dirent *de;
> > +    int fd, dfd;
> > +    DIR *dir;
> > +
> > +    dir = opendir("/proc/self/fd");
> > +    if (!dir) {
> > +        return;
> > +    }
> > +    /* Avoid closing the directory. */
> > +    dfd = dirfd(dir);
> > +
> > +    for (de = readdir(dir); de; de = readdir(dir)) {
> > +        fd = atoi(de->d_name);
> > +        if (fd != dfd) {
> > +            close(fd);
> > +        }
> > +    }
> > +    closedir(dir);
> > +}
> > +
> > +static void hup_handler(int signal)
> > +{
> > +    /* Check every second if this process has been reparented. */
> > +    while (the_ppid == getppid()) {
> > +        /* sleep() is safe to use in a signal handler. */
> > +        sleep(1);
> > +    }
> > +
> > +    /* At this point the parent process has terminated completely. */
> > +    _exit(0);
> > +}
> > +
> > +static int async_teardown_fn(void *arg)
> > +{
> > +    struct sigaction sa = { .sa_handler = hup_handler };
> > +    sigset_t hup_signal;
> > +    char name[16];
> > +
> > +    /* Set a meaningful name for this process. */
> > +    snprintf(name, 16, "cleanup/%d", the_ppid);
> > +    prctl(PR_SET_NAME, (unsigned long)name);
> > +
> > +    /*
> > +     * Close all file descriptors that might have been inherited from the
> > +     * main qemu process when doing clone, needed to make libvirt happy.
> > +     * Not using close_range for increased compatibility with older 
> > kernels.
> > +     */
> > +    close_all_open_fd();
> > +
> > +    /* Set up a handler for SIGHUP and unblock SIGHUP. */
> > +    sigaction(SIGHUP, &sa, NULL);
> > +    sigemptyset(&hup_signal);
> > +    sigaddset(&hup_signal, SIGHUP);
> > +    sigprocmask(SIG_UNBLOCK, &hup_signal, NULL);
> > +
> > +    /* Ask to receive SIGHUP when the parent dies. */
> > +    prctl(PR_SET_PDEATHSIG, SIGHUP);
> > +
> > +    /*
> > +     * Sleep forever, unless the parent process has already terminated. The
> > +     * only interruption can come from the SIGHUP signal, which in normal
> > +     * operation is received when the parent process dies.
> > +     */
> > +    if (the_ppid == getppid()) {
> > +        pause();
> > +    }
> > +
> > +    /* At this point the parent process has terminated completely. */
> > +    _exit(0);
> > +}
> > +
> > +/*
> > + * Block all signals, start (clone) a new process sharing the address space
> > + * with qemu (CLONE_VM), then restore signals.
> > + */
> > +void init_async_teardown(void)
> > +{
> > +    sigset_t all_signals, old_signals;
> > +    const int stack_size = 8192; /* Should be more than enough */
> > +    char *stack, *stack_ptr;
> > +
> > +    the_ppid = getpid();
> > +    stack = malloc(stack_size);
> > +    if (!stack) {
> > +        return;
> > +    }
> > +    stack_ptr = stack + stack_size;
> > +
> > +    sigfillset(&all_signals);
> > +    sigprocmask(SIG_BLOCK, &all_signals, &old_signals);
> > +    clone(async_teardown_fn, stack_ptr, CLONE_VM, NULL, NULL, NULL, NULL);
> > +    sigprocmask(SIG_SETMASK, &old_signals, NULL);
> > +}
> > diff --git a/util/meson.build b/util/meson.build
> > index 5e282130df..63acd59bb0 100644
> > --- a/util/meson.build
> > +++ b/util/meson.build
> > @@ -2,6 +2,7 @@ util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 
> > 'qemu-timer-common.c'))
> >   if not config_host_data.get('CONFIG_ATOMIC64')
> >     util_ss.add(files('atomic64.c'))
> >   endif
> > +util_ss.add(when: 'CONFIG_LINUX', if_true: files('async-teardown.c'))
> >   util_ss.add(when: 'CONFIG_POSIX', if_true: files('aio-posix.c'))
> >   util_ss.add(when: 'CONFIG_POSIX', if_true: files('fdmon-poll.c'))
> >   if config_host_data.get('CONFIG_EPOLL_CREATE1')  
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]