[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: leaky pipelines and Guix
From: |
Ricardo Wurmus |
Subject: |
Re: leaky pipelines and Guix |
Date: |
Sat, 5 Mar 2016 12:05:28 +0100 |
User-agent: |
mu4e 0.9.13; emacs 24.5.1 |
Ludovic Courtès <address@hidden> writes:
> Ricardo Wurmus <address@hidden> skribis:
>
>> So, how could I package something like that? Is packaging the wrong
>> approach here and should I really just be using “guix environment” to
>> prepare a suitable environment, run the pipeline, and then exit?
>
> Maybe packages are the wrong abstraction here?
>
> IIUC, a pipeline is really a function that takes inputs and produces
> output(s). So it can definitely be modeled as a derivation.
This may be true and the basic abstraction you propose seems correct and
useful, but I was talking about existing pipelines. They have already
been implemented using snakemake or make to keep track of individual
steps, etc. My primary concern is with making these pipelines work, not
to rewrite them.
For a particularly nasty pipeline I’m just using a separate profile
just for the pipeline dependencies. Users build the pipeline glue code
themselves by whatever means they deem appropriate and then load the
profile in a subshell:
bash
source /path/to/pipeline-profile/etc/profile
# run the pipeline here
exit
I think that these existing bio pipelines should really be treated more
like configurable packages. For a pipeline that we’re currently working
on I’m involved in making sure that it can be packaged and installed.
We chose to use autoconf to substitute tool placeholders at configure
time. This allows us to install the pipeline easily with Guix as we can
treat tools just as regular runtime dependencies. At configure time the
actual full paths to the needed tools are injected into the sources, so
we don’t need to propagate anything and make assumptions about PATH.
Many problems with bio pipelines stem from the fact that they are not
treated as first-class applications, so they often don’t have a wrapper
script, nor a configuration or installation step. I think the easiest
way to fix this is to encourage the design of pipelines as real software
packages rather than distributing bland Makefiles/snakefiles and
assuming that the user will arrange for a suitable environment.
~~ Ricardo