[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Improving CI throughput
From: |
Mathieu Othacehe |
Subject: |
Re: Improving CI throughput |
Date: |
Tue, 25 Aug 2020 15:32:50 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
Hey,
> Yeah, this is a ridiculous situation. We should do a hackathon to get
> better monitoring of useful metrics (machine load,
> time-of-push-to-time-to-build-completion, etc.), to clearly identify the
> bottlenecks (crashes? inefficient protocol? scheduling issues? Cuirass
> or offload or guix-daemon issue?), and to address as many of them as we
> can.
>
> Any volunteers? :-)
I'd really like to improve the situation! A hackathon seems like a
nice idea.
As a matter of fact, I already spent some times improving the stability
of Cuirass web interface[1].
Now I can see multiple topics that could be approached in parallel:
* Add metrics to Cuirass as you suggested. There's an open ticket about
that here[2].
* Investigate offloading issues[3].
* Fix database contention[4].
* Fix guix-daemon deadlocking[5].
* Monitor closely what's happening on Berlin and decide if it is
opportune to add a build scheduler mechanism somewhere. See what Hydra
is doing[6] and what Chris is proposing[7].
As most of the issues are only observed on Berlin machines, which access is
restricted, we will also have to find a way to reproduce them locally.
Anyway, if some people are motivated, we could try to plan a day or
week-end to work on those topics :).
Thanks,
Mathieu
[1]: https://issues.guix.gnu.org/42548.
[2]: https://issues.guix.gnu.org/32548.
[3]: https://issues.guix.gnu.org/34033.
[4]: https://issues.guix.gnu.org/42001.
[5]: https://issues.guix.gnu.org/31785.
[6]:
https://github.com/NixOS/hydra/blob/master/src/hydra-queue-runner/dispatcher.cc
[7]: https://lists.gnu.org/archive/html/guix-devel/2020-04/msg00323.html