[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Disarchive update
From: |
zimoun |
Subject: |
Re: Disarchive update |
Date: |
Tue, 12 Oct 2021 11:19:18 +0200 |
Hi Ludo,
On Sat, 09 Oct 2021 at 12:05, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
> If you run:
>
> guix build
> /gnu/store/nnl67m8c2x9rwqbnych1agc6p7g5473g-disarchive-collection.drv
Oh, cool!
> and if you’re patient :-), you eventually get a 579 MB directory
> containing Disarchive metadata for 8,413 tarballs out of 9,113 (the
> missing tarballs are those that “disarchive disassemble” fails to
> handle, for instance because it couldn’t guess what compression method
> is being used.)
Timothy made this table months ago:
tar+gz 9090 52.0%
git 5294 30.3%
tar+xz 1184 06.8%
tar+bz2 775 04.4%
tar 393 02.2%
zip 273 01.6%
svn-multi 175 01.0%
svn 125 00.7%
file 51 00.3%
computed 38 00.2%
hg 36 00.2%
unknown-uri 20 00.1%
tar+gz? 15 00.1%
tar+lz 13 00.1%
tar+Z 4 00.0%
cvs 3 00.0%
bzr 3 00.0%
tar+lzma 2 00.0%
total 17494 100.0%
What is really missing is XZ and Bzip2 support in Disarchive, I guess.
> Where to go from here? Timothy Sample had already set up a Disarchive
> database at <https://disarchive.ngyro.com>, which (guix download) uses
> as a fallback; I’m not sure exactly how it’s populated. The goal here
> would be for the Guix project to set up infrastructure populating a
> database automatically and creating backups, possibly via SWH (we’ll
> have to discuss it with them).
Timothy was working on feeding the database using each release. Well,
you can give a look at:
<https://git.ngyro.com/preservation-of-guix>
Then something along these lines:
$ sqlite3 /tmp/pog.db < schema.sql
$ guix repl -L . <(echo '
(use-modules (pog))
(ingest "6298c3ffd9654d3231a6f25390b056483e8f407c"
"/tmp/pog.db")
')
for where the commit hash corresponds to v1.0.0. I do not know if it
would be equivalent to run:
guix time-machine --commit=6298c3ffd9654d3231a6f25390b056483e8f407c \
-- build -m etc/disarchive-manifest.scm
> A plan we can already deploy would be:
>
> 1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>
> 2. On berlin, add an mcron job that periodically copies the output of
> the latest “disarchive-collection” build to a directory, say
> /srv/disarchive. Thus, the database would accumulate tarball
> metadata over time.
>
> 3. Add an nginx route so that /srv/disarchive is served at
> https://disarchive.guix.gnu.org.
>
> 4. Add disarchive.guix.gnu.org to (guix download).
To replace (or add to) the current ’%disarchive-mirrors’ right?
Going this road (use Cuirass), why not generating the sources.json
similarly? Instead of the hack using the website builder.
On my side, I will try to resume what I started months ago: knowing the
SWH coverage. For instance, on this ~92% of tarballs, how many are
currently stored into SWH? Well, do not take your breath and I would be
happy if someone beats me. ;-)
Cheers,
simon
Re: Disarchive update, Timothy Sample, 2021/10/13
Re: Disarchive update, Ludovic Courtès, 2021/10/14