[bug#30948] [PATCH core-updates] guix: Reap finished child processes in

From: Ludovic Courtès
Subject: [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
Date: Thu, 29 Mar 2018 22:07:05 +0200
Hi Carlo,

Carlo Zancanaro <address@hidden> skribis:

> When working on the Shepherd, I found that in the build containers
> processes don't get reaped by pid 1. See
> This caused
> (and will cause) the Shepherd's tests to fail on some systems.
> Our guile-builder script should handle SIGCHLD and then use waitpid to
> reap the child processes. Here's my attempt at a patch to do that.

I would rather install the handler as a phase in gnu-build-system: this
leaves ‘build-expression->derivation’ generic, and also gives us more
flexibility (e.g., we can disable that phase without doing a full
rebuild if needed.)  See the patch below.


On my first attempt with:

  ./pre-inst-env guix build -e '(@@ (gnu packages commencement) 

quickly failed:

--8<---------------cut here---------------start------------->8---
checking for vfork.h... no
checking for fork... yes
checking for vfork... yes
checking for working fork... Backtrace:
In ice-9/boot-9.scm:
checking for working vfork... (cached) yes
checking for strcasecmp...  157: 13 [catch #t #<catch-closure c900a0> ...]
In unknown file:
   ?: 12 [apply-smob/1 #<catch-closure c900a0>]
In ice-9/boot-9.scm:
  63: 11 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 10 [eval # #]
In ice-9/boot-9.scm:
2320: 9 [save-module-excursion #<procedure cc1b80 at ice-9/boot-9.scm:3961:3 
3966: 8 [#<procedure cc1b80 at ice-9/boot-9.scm:3961:3 ()>]
1645: 7 [%start-stack load-stack #<procedure cbd2c0 at ice-9/boot-9.scm:3957:10 
1650: 6 [#<procedure cc3060 ()>]
In unknown file:
   ?: 5 [primitive-load 
In ice-9/eval.scm:
 387: 4 [eval # ()]
In srfi/srfi-1.scm:
 619: 3 [for-each #<procedure 1217560 at 
 (expr)> ...]
 819: 2 [#<procedure 1217560 at 
 (expr)> #]
 614: 1 [invoke 
"/gnu/store/g34swjqyw205d15pyra39j56qvyxq9w9-bootstrap-binaries-0/bin/bash" ...]
In unknown file:
   ?: 0 [system* 
"/gnu/store/g34swjqyw205d15pyra39j56qvyxq9w9-bootstrap-binaries-0/bin/bash" ...]

ERROR: In procedure system*:
ERROR: In procedure system*: Interrupted system call
builder for `/gnu/store/hc96d5dcshbdgavpp0j01qnsjf0yf9z5-make-boot0-4.2.1.drv' 
failed with exit code 1
--8<---------------cut here---------------end--------------->8---

This is why ‘install-SIGCHLD-handler’ in the patch does nothing on Guile
<= 2.0.9.

Now, we’d need to test it for real with Guile 2.2.  I suppose one way to
test without rebuilding it all would be to add this phase explicitly in
a package and try building it with --rounds=10 or something.  Would you
like to try that?

Note that we have only a couple of days left before the ‘core-updates’


diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index be5ad78b9..2c6cb4ad2 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -51,6 +51,28 @@
    (define time-monotonic time-tai))
   (else #t))
+(define* (install-SIGCHLD-handler #:rest _)
+  "Handle SIGCHLD signals.  Since this code is usually running as PID 1 in the
+build daemon, it has to reap dead processes, hence this procedure."
+  ;; In Guile <= 2.0.9, syscalls could throw EINTR.  With these versions,
+  ;; installing a SIGCHLD handler is not safe because we could have uncaught
+  ;; 'system-error' exceptions at any time.
+  (when (or (not (string=? (effective-version) "2.0"))
+            (> (string->number (micro-version)) 9))
+    (format #t "installing SIGCHLD handler in PID ~a\n" (getpid))
+    (sigaction SIGCHLD
+      (lambda _
+        (let loop ()
+          (match (catch 'system-error
+                   (lambda ()
+                     (waitpid WAIT_ANY WNOHANG))
+                   (lambda args
+                     '(0 . -)))
+            ((0 . _) #f)
+            ((pid . _) (loop)))))
+  #t)
 (define* (set-SOURCE-DATE-EPOCH #:rest _)
   "Set the 'SOURCE_DATE_EPOCH' environment variable.  This is used by tools
 that incorporate timestamps as a way to tell them to use a fixed timestamp.
@@ -758,7 +780,8 @@ which cannot be found~%"
   ;; Standard build phases, as a list of symbol/procedure pairs.
   (let-syntax ((phases (syntax-rules ()
                          ((_ p ...) `((p . ,p) ...)))))
-    (phases set-SOURCE-DATE-EPOCH set-paths install-locale unpack
+    (phases install-SIGCHLD-handler
+            set-SOURCE-DATE-EPOCH set-paths install-locale unpack
             patch-source-shebangs configure patch-generated-file-shebangs

