Re: Packaging big generated data files?

From: Csepp
Subject: Re: Packaging big generated data files?
Date: Thu, 08 Dec 2022 14:46:51 +0100

Denis 'GNUtoo' Carikli <> writes:

> Hi,
> Is there any policies or past decisions of the Guix project on
> packaging big generated data files?
> I've added packages for software like kiwix-tools and navit that both
> work offline but that also need data files to be useful.
> Navit is a (car) navigation software that need maps. The maps can be
> generated from OpenStreetMap dumps with a tool available in Navit
> source code (maptool)[1] which is not packaged yet. Binary map files can
> also be downloaded directly from various sources.
> Right now the biggest file possible for such maps is about 47 GiB
> (for the whole planet).
> As for kiwix-tools, it can serve offline versions of websites like
> Wikipedia, and there too it needs files to work. The biggest file seems
> to be the complete version of English Wikipedia with scaled down
> pictures[2] and it takes about 89 GiB. I didn't look yet how these files
> were generated but I guess that they somehow can be generated from
> Wikipedia dumps.
> Packaging the binary files (without generating them) can be useful as
> it simplifies a lot the maintenance as one can just update the package
> version and checksum to update these. It also enables to keep the
> information (download URL, checksum, license) in one place and it
> enables easy reuse by Guix services and/or configuration files.
> If these files were generated in packages, it would also enable to
> tweak the data, for instance by adding height data in navit maps. As
> for kiwix compatible files, it would probably enable to decide when to
> make the snapshots or enable to package additional wikis
> (like the Libreplanet Wiki) or websites.
> The issue here is probably the size of the generated files: they are
> huge, so if they are packaged, they will most likely take significant
> resources in the Guix infrastructure.
> So what would be the way to go here? Would Guix accept patches to add
> packages for these files in Guix proper?  
> If so, does it needs to be done like with the ZFS (kernel module)
> package where "#:substitutable? #f" is used to avoid redistributing
> package builds? Or are other ways better for such use cases?
> Note that so far I've only packaged locally only kiwix compatible files
> for various wikis by just downloading already prepared files, so I
> didn't look yet into navit maps or into generating all these files, so
> I might miss some details about generating them.
> Denis.
Could ZIM files be downloaded over bittorrent as fixed output
derivations?  They can be pretty huge.  Also if the system started
seeding them as well, that would be pretty cool.

