Re: More progress with the Guix Data Service

From: Christopher Baines
Subject: Re: More progress with the Guix Data Service
Date: Mon, 20 May 2019 21:14:52 +0100
Ludovic Courtès <address@hidden> writes:

>> As well as listening to the Guix Commits mailing list for emails about
>> new revisions, more of the information in these emails is now stored, in
>> particular, the time they were sent, and the branch the email applies
>> to. This can be seen on the new Branches page [4].
>> 4:
> This is really nice.
> This information could also be gathered directly from the repo though,
> right?
> I would expect only patch submission info, and possibly commit
> notifications, to be grabbed from email, while the rest would be
> extracted from the repo, thereby hopefully limiting the risk of
> misinterpreting email.  WDYT?

So, currently the branch name, commit hash and date are taken from the
email. As far as I know, git branches are just pointers to commits, and
don't have any date/time associated with them. The commit date, or
author date in the commits could be stored and used, but I think these
are less interesting, and often misleading. The author date is often
quite different from the time a commit is pushed, and the commit date is
often different by some amount as well.

Currently, if you actually want to know what was the state of a
particular branch in the Guix git repository on Savannah was, at a
particular time, I think the most reliable way of checking would
probably be to check the guix-commits mailing list.

As the branch name, and commit hash both relate to the date, I don't see
that much problem with storing them.

One thing I've also been thinking about is loading in the guix-commits
mailing list archives. That would backfill the branch information, which
might be useful/interesting...

I did consider trying to access the clone of the Git repository that's
managed by the (guix inferiors) module, but I couldn't see an easy way
to do it, and as above, I'm not sure the date/time information is as
useful as what you can get from the mailing list.

>> There's now a basic search function on the packages page [5], and the
>> location, and the licenses for packages is now being stored (which can
>> be seen on the page for a package, for example [6]).
>> 5: 
>> 6: 
> Nice!
> One thing that be great is a page similar to
> <>,
> but keyed by package, where you get a list of the recent package
> versions (and/or derivations) and map them to specific commits.

Interesting, yeah, were you thinking of filtering that data for a
specific branch (like master or staging), or showing data for all

>> The URL is a bit long, but I think that is now close to being possible
>> with the Guix data service. I haven't got something working yet to
>> easily access data for the latest revision, but for a particular
>> revision, you can request a JSON file containing all the information I
>> think Repology currently gets about all packages. For example:
> Awesome.  (I advise passing “limit_results=900” though, because the URL
> above gives a pretty big result.  ;-))

Well, not that big? Icecat tells me it's 12MB. Also, I've recently added
a "All results" checkbox/query parameter, so you no longer have to make
up a large number. I wanted to make it possible to get all the data as a
single file, as that could simplify processing it, but there's also some
support for pagination.

The all results option is especially important as I've now done some
work on caching. That page should be served with a max-age of a day, it
could probably be even longer as well, as the only thing that will
change the contents is software changes. NGinx is also now caching
responses, and you can see what it's doing by looking at the
X-Cache-Status header in the response.

>> This is just the software side of the problem though. If this was to be
>> used by Repology, it would have to be a more permanent thing, similar to
>> the Cuirass and Mumi services that are currently setup around Guix. Does
>> anyone have any thoughts on this?
> I’d suggest having a Guix service for the whole thing, and making a
> branch in guix-maintenance.git such that bayfront (say) can run the
> service.
> Then we’ll have to reach consensus on guix-sysadmin as to which machine
> to use depending on the resources it needs, but if you have the config,
> I’d argue that we can happily run it on bayfront or perhaps berlin.  And
> we can give you access to the machine so you can reconfigure once in a
> while.

That all sounds really good :D

A package and service has been on my list of things to do, and I'll
hopefully sort that out in the next few weeks.

Currently I'm running it with Guix + support for isolated inferiors [1],
but I think that's something that can be made optional in the Guix Data
Service code, as initially I'd just be thinking about processing
revisions in the Guix git repository.




