bug#34157: Hydra: mozjs-60 builds on x86_64 and i686 seemingly get stuck

From: Mark H Weaver
Subject: bug#34157: Hydra: mozjs-60 builds on x86_64 and i686 seemingly get stuck
Date: Mon, 21 Jan 2019 21:54:43 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Efraim Flashner <address@hidden> writes:

> On Mon, Jan 21, 2019 at 10:31:46AM -0500, Mark H Weaver wrote:
>> Yesterday on Hydra, I found both Intel mozjs-60 builds seemingly stuck
>> while exporting the source checkout to hydra.gnunet.org.  One had been
>> going for ~22.5 hours, and the other for ~12 hours.  I forcefully killed
>> them and restarted them.  Now I see the same thing has happened on the
>> second attempt.  Both builds have been seemingly stuck like this for
>> about 19 hours:
>>   https://hydra.gnu.org/build/3342528
>>   https://hydra.gnu.org/build/3343511
>> In both cases, the build logs are empty, and the hydra log ends with:
>>   sending 1 store item to 'hydra.gnunet.org'...
>>   exporting path 
>> `/gnu/store/j2sz7dg35vkcz38sim71jll2ix1nk554-mozjs-60.2.3-2-checkout'
>> Of course, it's possible that they're not really stuck, but that they're
>> merely taking a ridiculously long time to send the source checkout to
>> the build slave.  My personal checkout of the mozilla-esr60 branch,
>> without the .hg directory, is about 2.1 gigabytes.
>> What do you think?
>>       Mark
> 12 hours is far too long for it to tie up a build slave, sending code or
> not.

Those two builds are still occupying build slots.  As I write this,
they've been running for over 30 hours.

I was curious whether the transfers were actually happening, even if
slowly, so I looked at 'netstat' output:

--8<---------------cut here---------------start------------->8---
address@hidden:~# netstat --inet --program | grep net.in.tum
tcp        0      0 20121227-hydra.gn:58007 hydra.net.in.tum.de:ssh ESTABLISHED 
tcp        0      0 20121227-hydra.gn:42586 hydra.net.in.tum.de:ssh ESTABLISHED 
tcp        0      0 20121227-hydra.gn:56413 hydra.net.in.tum.de:ssh ESTABLISHED 
--8<---------------cut here---------------end--------------->8---

There are currently three builds allocated to hydra.gnunet.org
(a.k.a. hydra.net.in.tum), so it appears that all three ssh connections
are still active.  However, even after repeating this command many
times, I've never seen a non-zero "Send-Q" value.  This suggests that no
data is actually being sent, but that it's stuck waiting for something.

I'll leave these builds alone for now, in case Ludovic wants to
investigate further.

> Being silent that long doesn't trigger the auto-kill?

I guess that the usual timeouts do not apply to file transfers performed
before the actual build takes place.


