guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Preservation of Guix Report


From: Timothy Sample
Subject: Preservation of Guix Report
Date: Wed, 20 Oct 2021 15:48:07 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi everyone!

Early this summer I did a bunch of work trying to figure out which Guix
sources are preserved by the SWH archive.  I’m finally ready to share
some preliminary results!

    https://ngyro.com/pog-reports/2021-10-20/

This report is already quite outdated, though.  It only covers commits
up to the end of May, and sometime in June is when the sources were
checked against the SWH archive.  I’m sharing it now to avoid any
further delays.

What’s cool is that the report is automated.  Next on my list is to
update the database and generate a new report.  Then, we can compare the
results and see if we are improving.  (My read on the results so far is
that improving “sources.json” will yield big improvements, but we might
not be able to get to that before the next report.)

The report itself only provides a very high level overview.  If you want
to check on specifics, you will have to download the database.  There’s
a link at the bottom of the report as well as a link to a detailed
schema definition.  Anyone interested in making some sense of the 5,043
known missing sources is encouraged to look there.  However, I can say
from my own investigation that a lot of them are kinda boring.  For
instance, 3,435 are from crates.io, CRAN, Hackage, Bioconductor, and
CPAN:

    select count(*)
    from fods
        join fod_references using (fod_id)
    where not is_in_swh
        and (reference like '%crates.io%' or
             reference like '%/cran/%' or
             reference like '%hackage%' or
             reference like '%/bioconductor.%' or
             reference like '%/cpan/%');
    => 3435

It’s surprising to me that SWH is not already getting these from
“sources.json”.  I picked an arbitrary one, “rust-quote-0.6”, and it’s
simply not in “sources.json”.  On the other hand, I bet SWH would like a
crates.io (and CRAN, etc.) loader, too.

One other more interesting approach might be to check Git sources:

    select count(*)
    from fods
        join fod_references using (fod_id)
    where not is_in_swh
        and reference like '(git-reference%';
    => 336

There are fewer, but they might be more interesting.  Just be sure to
check that they haven’t made it into the SWH archive since June.  For
instance, I just checked “asciidoc@9.1.0” and learned that the database
has “NOT is_in_swh”, but it is now in the SWH archive.  So, caveat
emptor, I guess.  Maybe it would be wise to wait for a more recent
report before diving in.

One other way to help would be to suggest improvements to the report.  I
don’t want to fiddle with it too much, but if there is some simple graph
or table or list that should be there, I’m happy to give it a go.


-- Tim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]