[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27
From: |
Ludovic Courtès |
Subject: |
bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 |
Date: |
Thu, 05 Jul 2018 10:34:38 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Hello Mark,
Thanks for chiming in!
Mark H Weaver <address@hidden> skribis:
> Does libgc spawn threads that run concurrently with user threads? If
> so, that would be news to me. My understanding was that incremental
> marking occurs within GC allocation calls, and marking threads are only
> spawned after all user threads have been stopped, but I could be wrong.
libgc launches mark threads as soon as it is initialized, I think.
> The first idea that comes to my mind is that perhaps the finalization
> thread is holding the GC allocation lock when 'fork' is called. The
> finalization thread grabs the GC allocation lock every time it calls
> 'GC_invoke_finalizers'. All ports backed by POSIX file descriptors
> (including pipes) register finalizers and therefore spawn the
> finalization thread and make work for it to do.
In 2.2 there’s scm_i_finalizer_pre_fork that takes care of shutting down
the finalization thread right before fork. So the finalization thread
cannot be blamed, AIUI.
> Another possibility: both the finalization thread and the signal
> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',
> which also temporarily grabs the GC allocation lock before calling the
> specified function. See 'GC_do_blocking_inner' in pthread_support.c in
> libgc. You spawn the signal delivery thread by calling 'sigaction' and
> you make work for it to do every second when the SIGALRM is delivered.
That’s definitely a possibility: the signal thread could be allocating
stuff, and thereby taking the alloc lock just at that time.
>> If that is correct, the fix would be to call fork within
>> ‘GC_call_with_alloc_lock’.
>>
>> How does that sound?
>
> Sure, sounds good to me.
Here’s a patch:
diff --git a/libguile/posix.c b/libguile/posix.c
index b0fcad5fd..088e75631 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1209,6 +1209,13 @@ SCM_DEFINE (scm_execle, "execle", 2, 0, 1,
#undef FUNC_NAME
#ifdef HAVE_FORK
+static void *
+do_fork (void *pidp)
+{
+ * (int *) pidp = fork ();
+ return NULL;
+}
+
SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
(),
"Creates a new \"child\" process by duplicating the current
\"parent\" process.\n"
@@ -1236,7 +1243,13 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
" further behavior unspecified. See \"Processes\" in the\n"
" manual, for more information.\n"),
scm_current_warning_port ());
- pid = fork ();
+
+ /* Take the alloc lock to make sure it is released when the child
+ process starts. Failing to do that the child process could start
+ in a state where the alloc lock is taken and will never be
+ released. */
+ GC_call_with_alloc_lock (do_fork, &pid);
+
if (pid == -1)
SCM_SYSERROR;
return scm_from_int (pid);
Thoughts?
Unfortunately my ‘call-with-decompressed-port’ reproducer doesn’t seem t
to reproduce much today so I can’t tell if this helps (I let it run more
than 5 minutes with the supposedly-buggy Guile and nothing happened…).
Thanks,
Ludo’.