bug#37762: 'guix offload' sets too short a timeout

From: Ludovic Courtès
Subject: bug#37762: 'guix offload' sets too short a timeout
Date: Tue, 15 Oct 2019 12:22:04 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hello Guix,

In (guix scripts offload) the SSH session is created like this:

       (make-session #:user (build-machine-user machine)
                     #:host (build-machine-name machine)
                     #:port (build-machine-port machine)
                     #:timeout 10       ;seconds
                     ;; …

What this means is that any connect(2), read(2), or write(2) call on the
underlying file descriptors that takes more than 10 seconds is
interpreted as EOF (at least on the Scheme side when reading from a
channel port; on the C side we might be able to distinguish.)

This was fine with libssh < 0.9.0 because that timeout was not honored
when reading from a channel due to a bug they fixed in libssh commit

libssh 0.9.0, added in Guix commit
44941fd7dbc77a7bf84a9be63a309eca3ffdc1c2, contains this bug fix, meaning
that the 10s session timeout is actually honored now.

So in practice, if you offload a build process and that process remains
silent for 10s (which is not that much!), then ‘guix offload’ thinks
it’s done and (confusingly) goes on to fetch the result from the build
machine, which is of course unavailable.  The end result is an equally
confusing error message like this (the last two lines):

--8<---------------cut here---------------start------------->8---
starting phase `bootstrap'
running './autogen.sh'
patch-shebang: ./autogen.sh: changing `/bin/sh' to 
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal -I config/m4
/gnu/store/iql3p5zvz0nwcsckdpywdkqxccx95ygx-bash-minimal-5.0.7/bin/sh: git: 
command not found
guix offload: error: corrupt input while restoring archive from #<input-output: 
channel (open) 7fc227fbc180>
guix build: error: build of 
`/gnu/store/dpz058x83sc7y1krpkdn84b45vl5p9cz-ucx-1.6.1.drv' failed
--8<---------------cut here---------------end--------------->8---

Working on a bug fix…


