guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

March update on data.guix.gnu.org (Guix Data Service)


From: Christopher Baines
Subject: March update on data.guix.gnu.org (Guix Data Service)
Date: Sun, 29 Mar 2020 21:45:30 +0100
User-agent: mu4e 1.2.0; emacs 26.3

Hey,

This follows on from the email I sent back in February [1].

1: https://lists.gnu.org/archive/html/guix-devel/2020-02/msg00268.html

As it turns out, quite a lot has happened over the last month and a bit!
In summary, this email talks about:

 - Providing database dumps, and how this works
 - Loading new revisions should now be much faster
 - A performance issue with the links of the package reproducibility
   page has been fixed
 - Data about builds and substitutes is more up to date
 - The Guix Data Service now runs on Guile 3
 - System test derivations are now computed for multiple systems
 - You can view package history by "output" now, as well as version and
   derivation
 - I'm no longer the only person making code changes!

There's now a page [2] that lists dumps of the database, previously this
was just NGinx's representation of the /var/lib/guix-data-service/dumps
directory. Creating new dumps was a manual process, but there's now a
mcron job on the machine that takes care of this so new data should
appear daily.

2: http://data.guix.gnu.org/dumps/

The Hetzner server on which data.guix.gnu.org is hosted only has 150GB
of disk space for the database and store, with 73GB currently being
taken up by the database. This didn't leave any space to store dumps,
let alone generate the small dumps, which require restoring a copy of
the database so that it can be modified.

I added a 100GB volume to the server, which acts as temporary space for
the dumps to be stored, and the small dumps to be created. For actually
storing the dumps longer term, I'm using a combination of git-annex [3]
and a file storage service called Wasabi [4]. I didn't want to write
backup code that only worked with Wasabi, so the idea of using git-annex
as well is that it deals with the details of how to move the files
around. I picked Wasabi because the storage is quite cheep, and it
doesn't charge for serving the files. Like the server, currently I'm
paying for this.

3: https://git-annex.branchable.com/
4: https://wasabi.com/

This should mean that backups are regularly available, which is
convenient. Also, the small backup has been improved over the last
month, it's now small again (~10GB for 2020-03-13, to 0.7GB for
2020-03-28) and includes data for system tests and channel instances
now.

I didn't test if Guile 3 had any impact on performance, but there have
been some data loading performance improvements over the last month. The
channel instance locking was improved, so more can be done in
parallel. Building on some changes in Guix for the derivation linter,
the Guix Data Service now can pass a store connection in to be used,
which also makes loading new revisions a little faster. I also looking
in to the very slow loading of package metadata [5]. This could take ~30
minutes previously, but I've now seen it happen in as little as 3
seconds!

5: Look for "debug: Finished querying the temp_package_metadata" in the
job output

It's not performance of loading data for new revisions, but I looked in
to why the links on the package reproducibility page ([6] for example)
for a revision would time out. This turned out to be an easy fix, just
add a database index in the right place. While the lack of data about
builds is still a limiting factor, this page [6] should be a bit more
useful and usable.

6: 
http://data.guix.gnu.org/revision/8f83699ba00743d258b497e0e5285989996ee559/package-reproducibility

I also spent some time debugging why the script for querying build
servers would hang or break when run for long periods. I think this was
resolved with some tweaks to http-get-multiple [7], and now I've been
able to leave the script running. This should have two positive effects,
the build and narinfo information on data.guix.gnu.org should be more up
to date than it was previously. Secondly, because the Guix Data Service
is regularly querying to narinfo files, including for new derivations,
this'll prompt guix-publish to bake nar files for these outputs
hopefully soon after the output has been generated.

7: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=39873

The Guix Data Service now works with Guile 3, and the Guix package has
been changed to use Guile 3.

System test derivations are now generated for multiple systems [8].

8: 
http://data.guix.gnu.org/revision/8b87d095b39dee91056b88f96b374faa8c3a8891/system-tests

Previously you could view the version, or derivation history for a
package on a branch, but now you can view the "output" history as well
[9]. This is in some ways more useful than the derivations history, as
there you get more entries due to changes in fixed output generations.

9: 
http://data.guix.gnu.org/repository/1/branch/master/package/libreoffice/output-history

I'm also now not the only one to have worked on the Guix Data Service
[10]. This is a positive sign for the "Improve internationalization
support for the Guix Data Service" Outreachy project. Providing there's
a successful applicant, I believe that'll be announced on the 27th of
April.

10: 
https://git.savannah.gnu.org/cgit/guix/data-service.git/commit/?id=f980b6c2acd4388627b5abb30bdf98fcbb18fb7f

Looking forward, I'd still like to see loading data be faster. One thing
I might try is parallelising parts like computing the channel instance
derivations and running the lint checkers. I'd also like to make some
sort of sitemap to make the pages more discoverable. Hopefully though,
it's getting towards the point where the Guix Data Service can start
being used as something to build upon, which is the way I've been
thinking about it. By making data about Guix available in this format,
it should be easier to build new and exciting tools and services.

Just let me know if you have any comments or questions!

Chris

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]