GSoC NPM

Hello Guix,

In the last hours of GSoC, the time has come to report on my progress, challenges and ideas regarding my project. To reiterate, the goals of this project:

- The ability to parse npm version data
- An npm backend for ~guix import~
- Npm modules in guix
- An actual build system for npm packages

When the project started, there was some code written by David Thompson that was
exactly what I needed to start on a node build system. For the importer of
things, I started to look at the gem importer; it seemed simple enough to grok,
while still offering the basic functionality I needed to get a running start.

To start of with something that did not work out as well as I had hoped, getting
a popular build system (e.g. Gulp, Grunt, Broccoli and others) packaged. As
mentioned in my earlier mails, the list of transitive dependencies of any of
these suffer from at least the following:
- It is a list with more than 4000 packages on it
- It is a list with at some point the package itself on it
As a compromise I wanted to get a testing framework packaged instead,
because everyone likes testing. While looking at the dependencies of a testing
framework, I noticed that I had a need for CoffeeScript. Having a passing
familiarity with the CoffeeScript dialect, I researched how one could achieve
this. At the moment of this writing, I have packaged CoffeeScript v1.0.0, to be
found in my git repo. What I did not account for nor foresaw was that
bootstrapping CoffeeScript took some effort. My earlier, optimistic estimates
were based on a flaw that lacked the isolation which is needed for a proper
reproducible build. Anyway, if any of you want to play around with CoffeeScript
v1.0.0 or any of the 50+(!) preceding versions, be my guest.

I also took both Ludovic', as well as Catonano's detailed feedback on the
initial draft of the recursive importer into account when rewriting it. It
should now only visit each node in the dependency graph once, and be a whole lot
more efficient as well. It is still based on the multi-valued return values that drove Ricardo's initial work on the CRAN recursive importer.

The amount of npm packages and the complexity of how they depend on one another
is enormous. As discussed on the guix-devel ML[0], it would be useful to gain
some insights into which packages would be worthwile to get into guix. As
Catonano noted, the problem is not on Guix' side; we have the (elementary)
building blocks with which to do the graph processing. The issue here is on how
to implement something akin to `fold-packages' for npm packages in order to
traverse the dependency graph. After rewriting the recursive importer to be more
sane, I scrawled some notes on my notepad that basically boil down to the
following:
1. We should only look up each npm package once, if possible
2. We should have a list of all npm package names.
3. We should be able to specify the maximum traversal depth

For (1.), a simplified version of the recursive npm importer can be used. For
(2.), once one has installed node (with npm) and executed some `npm search'
commands, there should be a file in
`$HOME/.npm/registry.npmjs.org/-/all/.cache.json' that contains, among other
things, a listing of all package names. npm can be configured to updated this
cache quite often, (or almost never). It does weigh in at a hefty 160MB. What is
left is wiring all this together, which I did not have my priority these months.

Regarding `guix refresh', one has to re-import an npm package in order to get an
up-to-date package-representation usable by guix. Originally I had thought that
this would be of similar difficulty to the other importers. Because we only use
the npm registry [1] to retrieve metadata and the location of the actual source
archive, we have no way of knowing whether a particular guix package originated
from the npm registry.

An easy-yet-inelegant solution would be to include the package name as used
within the npm registry as metadata via an argument to the node-build-system.
Think an `#:npm-name' key in the `arguments' field of the guix package
definition.

The importer should be able to handle most of the valid (and invalid) source
uri's you can find in the wild, especially github-related urls and shorthands.
See [3] for a list of packages that might need some changes to either their
package.json/npmregistry metadata, or obviate a change to the importer logic.

The current version of the importer only looks at the latest version of
packages. It should be easy to fix this by handling the address@hidden' suffix like
the hackage importer does. This could be useful to break some of the dependency
cycles that exist between npm packages. For this to work, a scheme different
from the current NODE_PATH will have to be considered. The first module with a
certain name found in NODE_PATH will be loaded at runtime, so in the current
implementation it is not possible to have multiple versions of a package with
the same name loaded at one moment.

Ricardo's idea of a recursive importer is pretty nice, imho. It should be doable
to implement some more of them in a similar fashion what has been done for cran
and npm.

While I hope nobody (including myself) has to package so many variants of the
same package again, it would be nice to somehow download _only_ the revision you
are interested in. AFAIK, there is no proper way for git to do this for the
general 'give me this commit' case. Something that I eventually did in order to
alleviate the ~3 minute checkout times for each iteration of CS, was the
following hack[2]. It basically puts a recent-enough copy of the CS git repo in
my store, and then made a shallow copy from that when using git-fetch. This took
my build times down to less than 10 seconds per iteration.

If you are interested in my work, have a look at:
https://github.com/wordempire/guix/commits/gsoc-final
, or just
`git clone https://github.com/wordempire/guix.git`
`git checkout gsoc-final`.

I will be trickling in a patch series onto the ML the next few days.

I guess that is enough text from me again. I would still like to express my
gratitude to my mentors David Thompson and Christopher Allan Webber, as well as
the rest of #guix and guix-devel (and some folks at GHM as well) for dealing
with my ramblings, questions and helping me keep this project fun. Special thanks to Catonano as well for having a close look at my code as well.

With just some tweaks to the importer, we should be able to at least package a huge subset of all the packages that require zero to few dependencies, once we are able to identify them.

I probably forgot quite some important and unimportant details, so if you have
any questions, tips or just want to blame me for getting more messy
_javascript_ into guix-land, send me a mail ;-).

- Jelle Licht

[0] https://lists.gnu.org/archive/html/guix-devel/2016-07/msg01726.html
[1] https://www.npmjs.com/
[2] http://paste.lisp.org/display/323999 <- beware, here be dragons etc
[3] http://paste.lisp.org/display/324007

From:	Jelle Licht
Subject:	GSoC NPM
Date:	Tue, 23 Aug 2016 11:07:22 +0200