[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#42162: Recovering source tarballs

From: Timothy Sample
Subject: bug#42162: Recovering source tarballs
Date: Wed, 26 Aug 2020 17:11:50 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hi zimoun,

zimoun <zimon.toutoune@gmail.com> writes:

> One question is how this database scales?
> For example, a quick back-to-envelop estimation leads to ~1.2GB metadata
> for ~14k packages and then an increase of ~700MB per year, both with the
> Ludo’s code [1].
> [1] <http://issues.guix.gnu.org/issue/42162#11>

It’s a good question.  A good part of the size comes from the
representation rather than the data.  Compression helps a lot here.  I
have a database of 3,912 packages.  It’s 295M uncompressed (which is a
little better than your estimation).  If I pass each file through Lzip,
it shrinks down to 60M.  That’s more like 15.5K per package, which is
almost an order of magnitude smaller than the estimation you used
(120K).  I think that makes the numbers rather pleasant, but it comes at
the expense of easy storing in Git.

> As mentioned [2], should this service be part of SWH (download cooking
> task)?  Or project side?
> [2] <https://forge.softwareheritage.org/T2430#47486>

It would be interesting to just have SWH absorb the project.  Since
other distros already know how to produce a “sources.json” and how to
query the SWH archive, it would mean that they benefit for free (and so
would Guix, for that matter).  I’m open to that, but right now having
the freedom to experiment is important.

-- Tim

reply via email to

[Prev in Thread] Current Thread [Next in Thread]