qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 00/23] backup performance: block_status + async


From: Max Reitz
Subject: Re: [PATCH v4 00/23] backup performance: block_status + async
Date: Wed, 20 Jan 2021 17:40:53 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0

On 20.01.21 17:04, Daniel P. Berrangé wrote:
On Wed, Jan 20, 2021 at 04:53:26PM +0100, Max Reitz wrote:
On 20.01.21 15:44, Max Reitz wrote:
On 20.01.21 15:34, Max Reitz wrote:

[...]

  From a glance, it looks to me like two coroutines are created
simultaneously in two threads, and so one thread sets up a special
SIGUSR2 action, then another reverts SIGUSR2 to the default, and
then the first one kills itself with SIGUSR2.

Not sure what this has to do with backup, though it is interesting
that backup_loop() runs in two threads.  So perhaps some AioContext
problem.

Oh, 256 runs two backups concurrently.  So it isn’t that interesting,
but perhaps part of the problem still.  (I have no idea, still looking.)

So this is what I found out:

coroutine-sigaltstack, when creating a new coroutine, sets up a signal
handler for SIGUSR2, then kills itself with SIGUSR2, then uses the signal
handler context (with a sigaltstack) for the new coroutine, and then (the
signal handler returns after a sigsetjmp()) the old SIGUSR2 behavior is
restored.

What I fail to understand is how this is thread-safe.  Setting up signal
handlers is a process-wide action.  When one thread changes what SIGUSR2
does, this will affect all threads immediately, so when two threads run
coroutine-sigaltstack’s qemu_coroutine_new() concurrently, and one thread
reverts to the default action before the other has SIGUSR2’ed itself, that
later SIGUSR2 will kill the whole process.

(I suppose it gets even more interesting when one thread has set up the
sigaltstack, then the other sets up its own sigaltstack, and then both kill
themselves with SIGUSR2, so both coroutines get the same stack...)

I have no idea why this has never been hit before, but it makes sense why
block-copy backup makes it apparent: It creates 64+x coroutines in a very
short time span, and 256 makes it do so in two threads concurrently (thanks
to launching two backups in two AioContexts in a transaction).

So...  Looks to me like a bug in coroutine-sigaltstack.  Not sure what to do
now, though.  I don’t think we can use block-copy for backup before that
backend is fixed.  (And that is assuming that it’s indeed
coroutine-sigaltstack’s fault.)

I’ll try to add some locking, see what it does, and send a mail concerning
coroutine-sigaltstack to qemu-devel.

I'm wondering if we should simply remove the sigaltstack impl and use
ucontext on MacOS too.

MacOS has ucontext marked as deprecated by default, seemingly because
this functionality was deprecated by POSIX. The functionality is still
available without deprecation warnings if you set _XOPEN_SOURCE.

From my outside point of view (on coroutine backends), everything you wrote below sounds like a very reasonable thing to do. So perhaps we should (I’m not the right person to decide that, though).

However, for me, the immediate question is what I can do now. I naively believe that dropping the sigaltstack implementation would require a deprecation cycle. (If it doesn’t and we can get it out in, say, a week, great.)

I need something that helps right now, because I have Vladimir’s series in my block branch, the failure doesn’t seem to be its fault, but I can’t send a pull request as long as concurrent backups in two iothreads make qemu effectively crash when using a specific coroutine backend. (And I don’t see configure giving me a warning that choosing sigaltstack might be bad idea.)

I suppose I hope that the prospect of wanting to drop sigaltstack altogether may lessen the resistance to just wrapping most of its qemu_coroutine_new() implementation in a lock until then... (i.e., what the RFC does that I’ve attached to
https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg05164.html)

Max

IOW, it is trivial to make the ucontext impl work on MacOS simply by
adding

  #define _XOPEN_SOURCE 600

before including ucontext.h in coroutine-ucontext.c, and removing the
restrictions in configure



diff --git a/configure b/configure
index 881af4b6be..a58bdf70f3 100755
--- a/configure
+++ b/configure
@@ -4822,8 +4822,9 @@ fi
  # specific one.
ucontext_works=no
-if test "$darwin" != "yes"; then
+
    cat > $TMPC << EOF
+#define _XOPEN_SOURCE 600
  #include <ucontext.h>
  #ifdef __stub_makecontext
  #error Ignoring glibc stub makecontext which will always fail
@@ -4833,7 +4834,6 @@ EOF
    if compile_prog "" "" ; then
      ucontext_works=yes
    fi
-fi
if test "$coroutine" = ""; then
    if test "$mingw32" = "yes"; then
diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 904b375192..9c0a2cf85c 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -22,6 +22,7 @@
  #ifdef _FORTIFY_SOURCE
  #undef _FORTIFY_SOURCE
  #endif
+#define _XOPEN_SOURCE 600
  #include "qemu/osdep.h"
  #include <ucontext.h>
  #include "qemu/coroutine_int.h"



Further more for iOS there was a proposal to add support for using
libucontext, which provides a clean impl of ucontext APIs for x86
and aarch64 hosts.

Regards,
Daniel





reply via email to

[Prev in Thread] Current Thread [Next in Thread]