[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
From: |
Claudio Imbrenda |
Subject: |
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux |
Date: |
Thu, 11 Aug 2022 14:03:25 +0200 |
On Wed, 10 Aug 2022 17:30:41 -0300
Murilo Opsfelder Araújo <muriloo@linux.ibm.com> wrote:
> Hi, Claudio.
Hi Murilo,
[...]
> I've smoke-tested this on ppc and everything looks fine.
> For what's worth:
>
> Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Tested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
thanks a lot for testing this!
>
>
> Have you measured the benefits of using -async-teardown vs. not using it?
> If so, can you please share the details so I can give it a try on ppc, too?
>
> The wall-clock perception is that nothing has changed, for better or worse.
> My tests used mid-sized VMs, like 128 vCPUs, 64GB RAM.
The number of CPUs doesn't really have any impact. 64G of RAM is quite
small to notice a sizeable difference, although you should be able to
see a few seconds of speedup in the shutdown. Also, starting a guest
with a lot of RAM is not enough, you have to make sure that the guest
ram is actually allocated (completely fill the ram in the guest before
shutting it down)
I just tested a 64G and a 256G guest on s390x. I measured the time
between the last line in the console ("Reached target Power-Off.") and
the moment when control comes back to the shell. The measurement was
not exactly super accurate (I manually ran "date +%s" in another shell
when I saw the last line in the console, and then again when I got the
shell back from qemu).
The 64G guest needs a few seconds, the 256G needs almost exactly 4
times as much. With the asynchronous teardown it's almost instant in
both cases (less than 1s, too fast to measure manually).
Try a multi-TB guest if you can (at the moment I can't) to
see a more marked effect.
Also consider that this is for _normal_ guests. Protected guests on
s390x have an even slower teardown due to the way protected
virtualization is implemented in the hardware.
I hope this was helpful
>
> Cheers!
>
> > ---
> > include/qemu/async-teardown.h | 22 ++++++
> > os-posix.c | 6 ++
> > qemu-options.hx | 17 +++++
> > util/async-teardown.c | 123 ++++++++++++++++++++++++++++++++++
> > util/meson.build | 1 +
> > 5 files changed, 169 insertions(+)
> > create mode 100644 include/qemu/async-teardown.h
> > create mode 100644 util/async-teardown.c
> >
> > diff --git a/include/qemu/async-teardown.h b/include/qemu/async-teardown.h
> > new file mode 100644
> > index 0000000000..092e7a37e7
> > --- /dev/null
> > +++ b/include/qemu/async-teardown.h
> > @@ -0,0 +1,22 @@
> > +/*
> > + * Asynchronous teardown
> > + *
> > + * Copyright IBM, Corp. 2022
> > + *
> > + * Authors:
> > + * Claudio Imbrenda <imbrenda@linux.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > your
> > + * option) any later version. See the COPYING file in the top-level
> > directory.
> > + *
> > + */
> > +#ifndef QEMU_ASYNC_TEARDOWN_H
> > +#define QEMU_ASYNC_TEARDOWN_H
> > +
> > +#include "config-host.h"
> > +
> > +#ifdef CONFIG_LINUX
> > +void init_async_teardown(void);
> > +#endif
> > +
> > +#endif
> > diff --git a/os-posix.c b/os-posix.c
> > index 321fc4bd13..4858650c3e 100644
> > --- a/os-posix.c
> > +++ b/os-posix.c
> > @@ -39,6 +39,7 @@
> >
> > #ifdef CONFIG_LINUX
> > #include <sys/prctl.h>
> > +#include "qemu/async-teardown.h"
> > #endif
> >
> > /*
> > @@ -150,6 +151,11 @@ int os_parse_cmd_args(int index, const char *optarg)
> > case QEMU_OPTION_daemonize:
> > daemonize = 1;
> > break;
> > +#if defined(CONFIG_LINUX)
> > + case QEMU_OPTION_asyncteardown:
> > + init_async_teardown();
> > + break;
> > +#endif
> > default:
> > return -1;
> > }
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 3f23a42fa8..d434353159 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -4743,6 +4743,23 @@ HXCOMM Internal use
> > DEF("qtest", HAS_ARG, QEMU_OPTION_qtest, "", QEMU_ARCH_ALL)
> > DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL)
> >
> > +#ifdef __linux__
> > +DEF("async-teardown", 0, QEMU_OPTION_asyncteardown,
> > + "-async-teardown enable asynchronous teardown\n",
> > + QEMU_ARCH_ALL)
> > +#endif
> > +SRST
> > +``-async-teardown``
> > + Enable asynchronous teardown. A new teardown process will be
> > + created at startup, using clone. The teardown process will share
> > + the address space of the main qemu process, and wait for the main
> > + process to terminate. At that point, the teardown process will
> > + also exit. This allows qemu to terminate quickly if the guest was
> > + huge, leaving the teardown of the address space to the teardown
> > + process. Since the teardown process shares the same cgroups as the
> > + main qemu process, accounting is performed correctly.
> > +ERST
> > +
> > DEF("msg", HAS_ARG, QEMU_OPTION_msg,
> > "-msg [timestamp[=on|off]][,guest-name=[on|off]]\n"
> > " control error message format\n"
> > diff --git a/util/async-teardown.c b/util/async-teardown.c
> > new file mode 100644
> > index 0000000000..07fe549891
> > --- /dev/null
> > +++ b/util/async-teardown.c
> > @@ -0,0 +1,123 @@
> > +/*
> > + * Asynchronous teardown
> > + *
> > + * Copyright IBM, Corp. 2022
> > + *
> > + * Authors:
> > + * Claudio Imbrenda <imbrenda@linux.ibm.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > your
> > + * option) any later version. See the COPYING file in the top-level
> > directory.
> > + *
> > + */
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +#include <sys/types.h>
> > +#include <sys/unistd.h>
> > +#include <dirent.h>
> > +#include <sys/prctl.h>
> > +#include <signal.h>
> > +#include <sched.h>
> > +
> > +#include "qemu/async-teardown.h"
> > +
> > +static pid_t the_ppid;
> > +
> > +/*
> > + * Close all open file descriptors.
> > + */
> > +static void close_all_open_fd(void)
> > +{
> > + struct dirent *de;
> > + int fd, dfd;
> > + DIR *dir;
> > +
> > + dir = opendir("/proc/self/fd");
> > + if (!dir) {
> > + return;
> > + }
> > + /* Avoid closing the directory. */
> > + dfd = dirfd(dir);
> > +
> > + for (de = readdir(dir); de; de = readdir(dir)) {
> > + fd = atoi(de->d_name);
> > + if (fd != dfd) {
> > + close(fd);
> > + }
> > + }
> > + closedir(dir);
> > +}
> > +
> > +static void hup_handler(int signal)
> > +{
> > + /* Check every second if this process has been reparented. */
> > + while (the_ppid == getppid()) {
> > + /* sleep() is safe to use in a signal handler. */
> > + sleep(1);
> > + }
> > +
> > + /* At this point the parent process has terminated completely. */
> > + _exit(0);
> > +}
> > +
> > +static int async_teardown_fn(void *arg)
> > +{
> > + struct sigaction sa = { .sa_handler = hup_handler };
> > + sigset_t hup_signal;
> > + char name[16];
> > +
> > + /* Set a meaningful name for this process. */
> > + snprintf(name, 16, "cleanup/%d", the_ppid);
> > + prctl(PR_SET_NAME, (unsigned long)name);
> > +
> > + /*
> > + * Close all file descriptors that might have been inherited from the
> > + * main qemu process when doing clone, needed to make libvirt happy.
> > + * Not using close_range for increased compatibility with older
> > kernels.
> > + */
> > + close_all_open_fd();
> > +
> > + /* Set up a handler for SIGHUP and unblock SIGHUP. */
> > + sigaction(SIGHUP, &sa, NULL);
> > + sigemptyset(&hup_signal);
> > + sigaddset(&hup_signal, SIGHUP);
> > + sigprocmask(SIG_UNBLOCK, &hup_signal, NULL);
> > +
> > + /* Ask to receive SIGHUP when the parent dies. */
> > + prctl(PR_SET_PDEATHSIG, SIGHUP);
> > +
> > + /*
> > + * Sleep forever, unless the parent process has already terminated. The
> > + * only interruption can come from the SIGHUP signal, which in normal
> > + * operation is received when the parent process dies.
> > + */
> > + if (the_ppid == getppid()) {
> > + pause();
> > + }
> > +
> > + /* At this point the parent process has terminated completely. */
> > + _exit(0);
> > +}
> > +
> > +/*
> > + * Block all signals, start (clone) a new process sharing the address space
> > + * with qemu (CLONE_VM), then restore signals.
> > + */
> > +void init_async_teardown(void)
> > +{
> > + sigset_t all_signals, old_signals;
> > + const int stack_size = 8192; /* Should be more than enough */
> > + char *stack, *stack_ptr;
> > +
> > + the_ppid = getpid();
> > + stack = malloc(stack_size);
> > + if (!stack) {
> > + return;
> > + }
> > + stack_ptr = stack + stack_size;
> > +
> > + sigfillset(&all_signals);
> > + sigprocmask(SIG_BLOCK, &all_signals, &old_signals);
> > + clone(async_teardown_fn, stack_ptr, CLONE_VM, NULL, NULL, NULL, NULL);
> > + sigprocmask(SIG_SETMASK, &old_signals, NULL);
> > +}
> > diff --git a/util/meson.build b/util/meson.build
> > index 5e282130df..63acd59bb0 100644
> > --- a/util/meson.build
> > +++ b/util/meson.build
> > @@ -2,6 +2,7 @@ util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c',
> > 'qemu-timer-common.c'))
> > if not config_host_data.get('CONFIG_ATOMIC64')
> > util_ss.add(files('atomic64.c'))
> > endif
> > +util_ss.add(when: 'CONFIG_LINUX', if_true: files('async-teardown.c'))
> > util_ss.add(when: 'CONFIG_POSIX', if_true: files('aio-posix.c'))
> > util_ss.add(when: 'CONFIG_POSIX', if_true: files('fdmon-poll.c'))
> > if config_host_data.get('CONFIG_EPOLL_CREATE1')
>
>
- [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Claudio Imbrenda, 2022/08/09
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Murilo Opsfelder Araújo, 2022/08/10
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux,
Claudio Imbrenda <=
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Daniel P . Berrangé, 2022/08/11
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Christian Borntraeger, 2022/08/11
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Claudio Imbrenda, 2022/08/11
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Daniel P . Berrangé, 2022/08/11
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Murilo Opsfelder Araújo, 2022/08/11
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Claudio Imbrenda, 2022/08/12
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Murilo Opsfelder Araújo, 2022/08/12
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Claudio Imbrenda, 2022/08/12
- Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Daniel P . Berrangé, 2022/08/23
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Markus Armbruster, 2022/08/30