bug#49613: On WSL1/2, guix install: error: cannot kill processes for uid

From: Sarah Morgensen
Subject: bug#49613: On WSL1/2, guix install: error: cannot kill processes for uid `999': failed with exit code 1
Date: Sat, 17 Jul 2021 21:20:25 -0700

Hello Guix,

Anadon <joshua.r.marshall.1991@gmail.com> writes:

> Talking with iskarian on IRC, we've confirmed that guix successfully
> installs, almost successfully sets up (the init.d isn't set up to
> actually daemonize), but for `guix build`, `guix install` and `guix
> pull` all fail with "guix <SUB_CMD>: error: cannot kill processes for
> uid `998': failed with exit code 1" when using WSL1/2.

I've investigated this a bit and it seems to be an issue with the return
code from `kill` when no other processes owned by that user exist to
kill. If I set a process to continually spawn with uid 998, I no longer
encounter the above error. Also, if there is a zombie process under that
uid, I no longer encounter the above error until I manually kill it.
This is on WSL1; I do not know if this technique also applies to WSL2.

For reference, the relevant portion of `nix/libutil/util.cc`:

--8<---------------cut here---------------start------------->8---
    Pid pid = startProcess([&]() {

        if (setuid(uid) == -1)
            throw SysError("setting uid");

        while (true) {
#ifdef __APPLE__
            /* OSX's kill syscall takes a third parameter that, among
               other things, determines if kill(-1, signo) affects the
               calling process. In the OSX libc, it's set to true,
               which means "follow POSIX", which we don't want here
            if (syscall(SYS_kill, -1, SIGKILL, false) == 0) break;
#elif __GNU__
            /* Killing all a user's processes using PID=-1 does currently
               not work on the Hurd.  */
            if (kill(getpid(), SIGKILL) == 0) break;
            if (kill(-1, SIGKILL) == 0) break;
            if (errno == ESRCH) break; /* no more processes */
            if (errno != EINTR)
                throw SysError(format("cannot kill processes for uid `%1%'") % 


    int status = pid.wait(true);
#if __GNU__
    /* When the child killed itself, status = SIGKILL.  */
    if (status == SIGKILL) return;
    if (status != 0)
        throw Error(format("cannot kill processes for uid `%1%': %2%") % uid % 
--8<---------------cut here---------------end--------------->8---

Perhaps the way WSL handles the return code for ``kill` is not as
expected? On a cursory inspection, though, the relevant parts of WSL2's
kernel/signal.c seem the same as the vanilla Linux kernel...

Or perhaps for some reason `kill(-1, SIGKILL)` under WSL is attempting
to kill the calling process (why?) and failing, therefore returning an

Note that the error code 1 reported by Guix does not seem to be the
actual errno reported by `kill`.

I've seen Guix working on WSL2 in the wild before [0] so this is a
really odd error. I'm stumped.



