[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Substitute timeouts

From: Mathieu Othacehe
Subject: Substitute timeouts
Date: Mon, 09 Aug 2021 12:28:39 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)


I have been investigating a problem that is visible both on the main
guix publish server at[1] and on the Cuirass
build farm[2].

This error comes from the fact that the publish server does not accept
the "guix substitute" connection requests within the %fetch-timeout
duration of 5 seconds.

The main guix publish server is using a cache. If a requested narinfo is
not in the cache, it will be baked and the client receives a 404
error. Since ecaa102a58ad3ab0b42e04a3d10d7c761c05ec98 and the
introduction of the bypass mechanism, small store items are directly

This means that the "narinfo-string" procedure can be called directly in
the main publish thread. Running perf on the main publish server reveals
that this procedure can be really expensive under IO pressure (GC
running for example) because it opens a lot of files. I have observed
that the "read-derivation-from-file" call can take up to 600 ms.

If multiple clients were to ask narinfo of several items not yet cached,
under IO pressure, I think that the publish server could become
unresponsive and cause the timeout errors.

The fact that Cuirass triggers the baking of successfully built
derivations probably doesn't help here.

Now regarding the timeout errors that are much more frequent on the
Cuirass build farm, the cause varies a bit. The Cuirass publish server
running on Berlin does not use a cache. This means that the
"narinfo-string" procedure is called for each request, in the main

To fix those issues, a solution could be to run the "narinfo-string" in
a separate thread, but it will make the publish server code even harder
to understand. My proposition would be to get rid of the bypass
mechanism and instead implement a retry when some substitutes are
reported as being baked, as proposed by Miguel[3].

I think this is the most reasonable solution. This way, users won't
receive 404 errors and start building substitutes that are being

It will also allow the Cuirass build farm to use directly the main guix
publish server, simplifying the current CI setup.

There's a proposed patch attached, WDYT?




Attachment: patch
Description: Binary data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]