[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: plz is there a roadmap for a more resilient substitutes infrastructu

From: Giovanni Biscuolo
Subject: Re: plz is there a roadmap for a more resilient substitutes infrastructure?
Date: Sun, 11 Nov 2018 19:56:45 +0100


sorry for my late reply

I confess I haven't still read the whole Guix/GuixSD Reference Maulal,
so my apologies if I'm asking something already documented :-S

address@hidden (Ludovic Courtès) writes:


> We Guix developers don’t have control over the physical hardware behind
>; for this machine, we rely on the work of the FSF
> sysadmins for all things hardware/networking.

OK, thanks for this info

> Unfortunately in this case, this maintenance period was rather
> unprepared: it wasn’t supposed to last a whole week, rather a few hours
> or a day at most.  Most of the time it took was about copying data to a
> new disk (!).

is it published somewhere what are the minimum hardware and disk needs
for a complete GuixSD distribution build server?

> Had this been prepared, we could have arranged to keep
> up until the replacement was ready.  We Guix developers
> didn’t have much visibility over what was going on though, and we just
> didn’t anticipate this.

sorry about that, I'm a sysadmin and I know how much my work is
impacting others :-)

> It is clear that this prolonged downtime was harmful to many users and
> to the project’s reputation.

GuixSD does not deserve this kind of harm :-(

> What to do from here?

I once saw the existance of [1] you pointed
me (below), but did not read the entire tree

now I see we have

should we add a new "super" task named "resilience of subsitutes

looking at
it seems that some deggree of resilience for is already in
place but this does not seem to work as a distributed source of
substitute servers, but "just" to offload build jobs to the defined list
of build servers

could servers in "machines.scm" also be used as substitutes servers?

> Our main focus is on making the primary build farm of
> the project.  It has the advantage that one Guix dev has physical access
> to it (Ricardo); it’s also much more powerful than and the
> associated build machines.

OK, I see it

more details could help fix related issues

IMHO a public Sysadmins Manual should be in the roadmap (as
MAYBE): that could help the core team job, show the community how the
job is done *and* help others to build on our best practices

Guix/GuixSD is *the* perfect tool for IaC (infrastructure as code),
could be *very* interesting to develop a "Literate GuixSD IaC package"
as a meta-project :-) 

maybe we could (slowly) build a reproducible IaC literate devops
document, based on org-mode babel, so we'd have both tangled code and
exported documentation

> Yet, there’s more work to do: berlin has just 1T of disk space.  Ricardo
> started looking on growing it but was stuck on software issues IIRC.  I
> think fixing this should be a priority, so I think we should help
> Ricardo fix the software issues as much as we can.

I realize I'm pretty new in this community and you can't trust me since
we do non even know each other... but I could help if needed, just tell
me (in private if more appropriate) what's the hardware issue

> That alone doesn’t fix the resilience issue: could go
> down at some point for some time.
> To address that, a possibility that was discussed recently on
> guix-sysadmin is use has a separate build farm

guess you meant "use *as* a separate build farm"

> and/or mirror of berlin.


>> given the prolonged issue, please also consider writing an *official*
>> blog post explaining the current situation and steps adopted to prevent
>> similar issues in the future
> We set up the info-guix mailing list with that in mind (but too late for
> this incident).  Posting blog posts is also a good idea; we should have
> done that, with instructions on how to switch to

given the impact on project reputation, please consider a "post-mortem"
blog post on what happened: something in line with Ludo's reply to me

not all interested users and observers read this (and others) mailing
list archives

>> 1. is there a method to "replicate the whole store of an official server
>> (e.g. once healed)" so we can just "guix publish" a
>> *complete* mirror? In this case a ready to use official
>> mirror-config.scm could be useful
> is a simple nginx proxy to  You can
> find its config here:

OK, so it's caching proxy
I'll see if and how I can build a similar one

sorry but I still don't understand why failed
serving substitutes during a 0.15 installation started from the install
CD: it was a cache size problem?

> In the past a few people set up their own mirrors using a similar
> configuration.

we shold build a network of organizations and individuals for this

>> 2. is there an official mirrors directory users can look at when needed?
> No.

I volunteer to keep such a list and coordinate the "volunteers network",
if you want

>> 3. is there a plan to build a service similar to
>> (I looked on the web but did not find any
>> reference to such plan)
> Like I wrote, there’s no concrete plan at this point, which means it’s
> an opportunity for you and anyone else to chime in and give a hand!

I have no experience in building such a service but it definitely fits
in my professional enhancement plan, so I'm still not able to lead such
a project but I can help


Giovanni Biscuolo

Xelera IT Infrastructures

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]