guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Substitute retention


From: Ludovic Courtès
Subject: Re: Substitute retention
Date: Fri, 15 Oct 2021 11:27:17 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

>>> missed by both build farms using 2 different strategies to collect the
>>> thing to build (fetch every 5 minutes or fetch from guix-commits).  It
>>> is a quick back to envelope so keep that with some salt. :-)
>>
>> OK.
>
> To make it explicit of #1, I was talking about the “modular” Guix, i.e.,
> when running “guix pull” or “guix time-machine” it leads to build the
> derivations module-import.drv, guix-<hash>.drv, guix-command.drv,
> guix-module-union.drv, guix-<hash>-modules.drv,
> guix-packages-modules.drv, guix-system-tests-modules.drv,
> guix-packages-base-modules.drv, etc.  On slow machines, it can be
> unpleasant; not to say unpractical.  Even for recent commits.

Ah I see.  Yeah, this can be kinda annoying, and amplified by the fact
that CI only builds at each push, not at each commit.

That said, this is mitigated by the fact that one typically travels to a
previously-fetched commit, which is a commit that has been built by CI
rather than a commit in between two pushes.

> Basically, commit 59d10c3112 is from March 14, 2020 and it takes ~29min
> on my slow laptop.  And to compare apple to apple, let take another
> commit one year later from March 14, 2021, e.g., commit 7327295462.  It
> takes ~5min on the same machine.

Yeah, OK.

> To be on the same wavelength,
>
> $ git log --format="%h %cd" --after=2021-03-14 --reverse | head -n16
> [...]
> 2babf7d831 Sun Mar 14 19:16:55 2021 +0100
> b15720182e Sun Mar 14 13:24:21 2021 -0500
> 207aa62e6b Sun Mar 14 13:24:21 2021 -0500
> 30f5381487 Sun Mar 14 13:24:21 2021 -0500
> af25357b7d Sun Mar 14 13:24:21 2021 -0500
> 7164d2105a Sun Mar 14 13:24:21 2021 -0500
> 078f3288e2 Sun Mar 14 13:24:21 2021 -0500
> 5a31eb7d35 Sun Mar 14 13:24:21 2021 -0500
> 620206b680 Sun Mar 14 13:24:22 2021 -0500
> b76762a9b7 Sun Mar 14 13:24:22 2021 -0500
> cbfcbb79df Sun Mar 14 19:43:35 2021 +0100
>
> and Cuirass builds only one of b15720182e, 207aa62e6b, 30f5381487,
> af25357b7d, 7164d2105a, 078f3288e2, 5a31eb7d35, 620206b680 or
> b76762a9b7.
>
> Considering the Build Coordinator, it uses guix-commits and from my
> understanding it reads:
>
> <https://lists.gnu.org/archive/html/guix-commits/2021-03/msg01201.html>
>
> therefore, b15720182e would be missed but not b76762a9b7–which would be
> missed by Cuirass.
>
> Cuirass and the Build Coordinator cannot each build the both commits
> b15720182e and b76762a9b7.
>
> Cuirass check every 5 minutes and Build Coordinator reads “state” from
> guix-commits.  Other said, none of them builds all these “modular”
> derivations for all the commits; even for recent commits.
>
> The rough estimate is half of commits are missed by both build farms.
> Therefore, using “guix time-machine” with a random commit and one gets
> 1/2 probability to build something just to get the inferior – aside the
> TTL policy.

Right.  Not every derivation produced by (guix self) needs to be rebuilt
in between two commits, but anything that depends on *package-modules*
typically has to be rebuilt.

We can reduce the amount of rebuilt like I did in commit
abd38dcee16f0ac71191527c38dcd3659111e2ba, but you’ll always have the big
(gnu packages …) derivation.

>> So what can we do to address this issue?  I *think* we could use a
>> higher TTL on berlin, and we can try that right away (9 months to being
>> with?).
>
> I *think* the issue is not TTL for question #1.  :-) But the issue that
> the both build farms do not build these “modular” derivations for all
> the commits.  Here, I am focused on x86_64-linux which is the case of
> interest for such topic (scientific context), IMHO.
>
> Considering to build for every commit for all architectures is not
> affordable.
>
> I agree that increasing the TTL will help for question #2 about
> long-support of substitutes.

Understood!

>> However, there is an upper bound anyway.  To make informed decisions on
>> the retention policy, we should monitor storage space on berlin/bayfront
>> to better estimate what can be done.  We have Zabbix but it’s not
>> accessible from the outside; maybe we could graph storage space
>> somewhere so people can grab the data and work on those estimates?
>
> Based on the size of these derivations for one commit, we could
> extrapolate back to envelope.  Well, question #1 seems doable
> storage-speaking.
>
> The issue of #1 is to build these derivations for all the commits.
> IMHO.
>
> About #2, yeah if some data are available, I can try to make some
> estimates.
>
>
> Well, #1 seems actionable.  However, #2 raises…
>
>> What if we decide that we need to provide substitutes for 2y old
>> commits?  In that case, we need a plan to scale up.  That could be
>> renting storage space somewhere.  That’s largely non-technical work that
>> needs attention.
>
> …a strong question. :-) What do “we” do for what “we” build?
>
> Indeed, numbers are missing to make informed decisions on long-term
> storage of substitutes.  What is Nix doing?

Nix, AFAIK, is doing like everyone else: pouring money on Amazon.  Last
I heard they’d retain substitutes basically indefinitely on Amazon S3
(incidentally, one motivation for them to work with Software Heritage,
AIUI, is that it would allow them to store less data on the storage they
pay for :-)).

For the record, berlin (aka ci.guix.gnu.org; it was donated by the Max
Delbrück Center, MDC, and is generously hosted by them) has a 37 TiB
disk for /gnu/store and “baked” substitutes.  That’s a lot.

Technically though, a lot of it is used by less important substitutes
such as disk images or intermediate ‘core-updates’ substitutes.

In the end we seem to be filling it more quickly than you’d think!

Perhaps we need a better strategy with a low TTL for, say, intermediate
‘core-updates’ substitutes (no need to keep them more than a few weeks
if we know we’re doing a world rebuild right after).  It cannot be done
as things are though because ‘guix publish’ doesn’t distinguish between
store items.

Or we could restart the Amazon front-end that Chris Marusich had set up
right before 1.0 was released.  Or we could build our own front-end for
substitute delivery as a proxy to berlin, thereby distributing the
burden.

Thoughts?

> I think that having 2 build farms building in parallel is a strength.
> So let exploit it. :-) What one could have in mind is to challenge the
> outputs; if they are identical, let keep only one version “somewhere”
> and remove the other from the “elsewhere”.
>
> For instance, we (I? with help) could resume this discussion:
>
> <https://lists.gnu.org/archive/html/guix-devel/2020-10/msg00181.html>

I hadn’t seen this message, interesting!

Note however that bordeaux.guix has a tenth of the storage space of
berlin (3.6 TiB), so right now we probably can’t count on it for
long-term substitute storage.

> Or maybe, for the identical outputs, one could imagine (dream? for) a
> cooking service for missing outputs.  Well, I do not know how this is
> actionable. :-)

Well, if we keep .drv around, we could arrange so that ‘guix publish’
rebuilds on-demand, after all.  I’m not sure how practical that would
be, though.

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]