[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#42162: Recovering source tarballs

From: zimoun
Subject: bug#42162: Recovering source tarballs
Date: Mon, 20 Jul 2020 17:52:09 +0200


On Mon, 20 Jul 2020 at 10:39, Ludovic Courtès <ludo@gnu.org> wrote:
> zimoun <zimon.toutoune@gmail.com> skribis:
> > On Sat, 11 Jul 2020 at 17:50, Ludovic Courtès <ludo@gnu.org> wrote:

> There are many many comments in your message, so I took the liberty to
> reply only to the essence of it.  :-)

Many comments because many open topics. ;-)

> However, the two examples above are good ideas as to the way forward: we
> could start a url-fetch-to-git-fetch migration in these two cases, and
> perhaps more.

Well, to be honest, I have tried to probe such migration when I opened
this thread:


and I have tried to summarized the pros/cons arguments here:


> > What about in addition push to IPFS?  Feasible?  Lookup issue?
> Lookup issue.  :-)  The hash in a CID is not just a raw blob hash.
> Files are typically chunked beforehand, assembled as a Merkle tree, and
> the CID is roughly the hash to the tree root.  So it would seem we can’t
> use IPFS as-is for tarballs.

Using the Git-repo map/table, then it becomes an option, right?
Well, SWH would be a backend and IPFS could be another one.  Or any
"cloudy" storage system that could appear in the future, right?

> >>   • If we no longer deal with tarballs but upstreams keep signing
> >>     tarballs (not raw directory hashes), how can we authenticate our
> >>     code after the fact?
> >
> > Does Guix automatically authenticate code using signed tarballs?
> Not automatically; packagers are supposed to authenticate code when they
> add a package (‘guix refresh -u’ does that automatically).

So I miss the point of having this authentication information in the
future where upstream has disappeared.
The authentication is done at packaging time.  So once it is done,
merged into master and then pushed to SWH, being able to authenticate
again does not really matter.

And if it matters, all should be updated each time vulnerabilities are
discovered and so I am not sure SWH makes sense for this use-case.

> But today, we store tarball hashes, not directory hashes.

We store what "guix hash" returns. ;-)
So it is easy to migrate from tarball hashes to whatever else. :-)
I mean, it is "(sha256 (base32" and it is easy to have also
"(sha256-tree (base32" or something like that.

In the case where the integrity is also used as lookup key.

> > The format of metadata (disassemble) that you propose is schemish
> > (obviously! :-)) but we could propose something more JSON-like.
> Sure, if that helps get other people on-board, why not (though sexps
> have lived much longer than JSON and XML together :-)).

Lived much longer and still less less less used than JSON or XML alone. ;-)

I have not done yet the clear back-to-envelop computations.  Roughly,
there are ~23 commits on average per day updating packages, so say 70%
of them are url-fetch, it is ~16 new tarballs per day, on average.
How the model using a Git-repo will scale?  Because, naively the
output of "disassemble-archive" in full text (pretty-print format) for
the hello-2.10.tar is 120KB and so 16*365*120K = ~700Mb per year
without considering all the Git internals.  Obviously, it depends on
the number of files and I do not know if hello is a representative

And I do not know how Git operates on binary files if the disassembled
tarball is stored as .go file, or any other.

All the best,

Just if someone wants to check from where I estimate the numbers.

--8<---------------cut here---------------start------------->8---
for ci in $(git log --after=v1.0.0 --oneline \
                | grep "gnu:" | grep -E "(Add|Update)" \
                | cut -f1 -d' ')
    git --no-pager log -1 $ci --format="%cs"
done | uniq -c > /tmp/commits

guix environment --ad-hoc r-minimal \
     -- R -e 'summary(read.table("/tmp/commits"))'

gzip -dc < $(guix build -S hello) > /tmp/hello.tar
guix repl -L /tmp/tar/

scheme@(guix-user)> (call-with-input-file "hello.tar"
          (lambda (port)
                 (disassemble-archive port)))
--8<---------------cut here---------------end--------------->8---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]