[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: leaky pipelines and Guix
From: |
Ludovic Courtès |
Subject: |
Re: leaky pipelines and Guix |
Date: |
Mon, 07 Mar 2016 10:54:45 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Ricardo Wurmus <address@hidden> skribis:
> Ludovic Courtès <address@hidden> writes:
>
>> Ricardo Wurmus <address@hidden> skribis:
>>
>>> So, how could I package something like that? Is packaging the wrong
>>> approach here and should I really just be using “guix environment” to
>>> prepare a suitable environment, run the pipeline, and then exit?
>>
>> Maybe packages are the wrong abstraction here?
>>
>> IIUC, a pipeline is really a function that takes inputs and produces
>> output(s). So it can definitely be modeled as a derivation.
>
> This may be true and the basic abstraction you propose seems correct and
> useful, but I was talking about existing pipelines. They have already
> been implemented using snakemake or make to keep track of individual
> steps, etc. My primary concern is with making these pipelines work, not
> to rewrite them.
Oh, got it.
> For a particularly nasty pipeline I’m just using a separate profile
> just for the pipeline dependencies. Users build the pipeline glue code
> themselves by whatever means they deem appropriate and then load the
> profile in a subshell:
>
> bash
> source /path/to/pipeline-profile/etc/profile
> # run the pipeline here
> exit
>
> I think that these existing bio pipelines should really be treated more
> like configurable packages. For a pipeline that we’re currently working
> on I’m involved in making sure that it can be packaged and installed.
> We chose to use autoconf to substitute tool placeholders at configure
> time. This allows us to install the pipeline easily with Guix as we can
> treat tools just as regular runtime dependencies. At configure time the
> actual full paths to the needed tools are injected into the sources, so
> we don’t need to propagate anything and make assumptions about PATH.
>
> Many problems with bio pipelines stem from the fact that they are not
> treated as first-class applications, so they often don’t have a wrapper
> script, nor a configuration or installation step. I think the easiest
> way to fix this is to encourage the design of pipelines as real software
> packages rather than distributing bland Makefiles/snakefiles and
> assuming that the user will arrange for a suitable environment.
Indeed. Then I think if existing pipelines are shell scripts or small
programs, it makes sense to treat them as packages.
Ludo’.