[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Postmortem of service downtime
From: |
Maxim Cournoyer |
Subject: |
Re: Postmortem of service downtime |
Date: |
Fri, 24 May 2024 21:19:44 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
> From Sunday May 19th to Tuesday may 21st, for about 36h,
> bayfront.guix.gnu.org, the machine behind many services went down:
>
> https://lists.gnu.org/archive/html/info-guix/2024-05/msg00000.html
>
> Affected web sites and services included:
>
> guix.gnu.org
> bordeaux.guix.gnu.org
> logs.guix.gnu.org
> hpc.guix.info
> foundation.guix.info
> packages.guix.gnu.org
> qa.guix.gnu.org
>
[...]
> A large part of the slowness was due to ‘guix substitute’ reading
> all the 300K+ entries from /var/guix/substitute/cache and deleting
> them, one by one (this took several minutes). Chris had mentioned
> that performance issue in the past; it’s not much of a problem on
> one’s laptop with an SSD, but it’s clearly a problem here where
> there are more entries than usual. We should at least drastically
> reduce the TTL of cache entries.
Interesting!
> • qa-frontpage failed to build when we first reconfigured the machine,
> so we commented it out. This is now fixed:
>
>
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=3fecb1e8fdea65a7440fec403c1c52da197b5dfe
>
> • guix-packages-website (the server behind packages.guix.gnu.org)
> still refuses to start with an Artanis error:
>
> https://issues.guix.gnu.org/71138
>
> Ludo’, on behalf on the emergency rescue^W^W sysadmin team.
Phew! Thanks for the detailed write-up and for the fixes/thankless work
of bringing the machine back up and running.
--
Maxim