[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27

From: Ludovic Courtès
Subject: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27
Date: Thu, 05 Jul 2018 10:34:38 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Hello Mark,

Thanks for chiming in!

Mark H Weaver <address@hidden> skribis:

> Does libgc spawn threads that run concurrently with user threads?  If
> so, that would be news to me.  My understanding was that incremental
> marking occurs within GC allocation calls, and marking threads are only
> spawned after all user threads have been stopped, but I could be wrong.

libgc launches mark threads as soon as it is initialized, I think.

> The first idea that comes to my mind is that perhaps the finalization
> thread is holding the GC allocation lock when 'fork' is called.  The
> finalization thread grabs the GC allocation lock every time it calls
> 'GC_invoke_finalizers'.  All ports backed by POSIX file descriptors
> (including pipes) register finalizers and therefore spawn the
> finalization thread and make work for it to do.

In 2.2 there’s scm_i_finalizer_pre_fork that takes care of shutting down
the finalization thread right before fork.  So the finalization thread
cannot be blamed, AIUI.

> Another possibility: both the finalization thread and the signal
> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',
> which also temporarily grabs the GC allocation lock before calling the
> specified function.  See 'GC_do_blocking_inner' in pthread_support.c in
> libgc.  You spawn the signal delivery thread by calling 'sigaction' and
> you make work for it to do every second when the SIGALRM is delivered.

That’s definitely a possibility: the signal thread could be allocating
stuff, and thereby taking the alloc lock just at that time.

>> If that is correct, the fix would be to call fork within
>> ‘GC_call_with_alloc_lock’.
>> How does that sound?
> Sure, sounds good to me.

Here’s a patch:

diff --git a/libguile/posix.c b/libguile/posix.c
index b0fcad5fd..088e75631 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1209,6 +1209,13 @@ SCM_DEFINE (scm_execle, "execle", 2, 0, 1,
 #undef FUNC_NAME
 #ifdef HAVE_FORK
+static void *
+do_fork (void *pidp)
+  * (int *) pidp = fork ();
+  return NULL;
 SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
            "Creates a new \"child\" process by duplicating the current 
\"parent\" process.\n"
@@ -1236,7 +1243,13 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
         "         further behavior unspecified.  See \"Processes\" in the\n"
         "         manual, for more information.\n"),
        scm_current_warning_port ());
-  pid = fork ();
+  /* Take the alloc lock to make sure it is released when the child
+     process starts.  Failing to do that the child process could start
+     in a state where the alloc lock is taken and will never be
+     released.  */
+  GC_call_with_alloc_lock (do_fork, &pid);
   if (pid == -1)
   return scm_from_int (pid);

Unfortunately my ‘call-with-decompressed-port’ reproducer doesn’t seem t
to reproduce much today so I can’t tell if this helps (I let it run more
than 5 minutes with the supposedly-buggy Guile and nothing happened…).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]