Re: Use guix to distribute data & reproducible (data) science

guix-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use guix to distribute data & reproducible (data) science

From:	Konrad Hinsen
Subject:	Re: Use guix to distribute data & reproducible (data) science
Date:	Fri, 9 Feb 2018 20:15:28 +0100
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

Hi,

On 09/02/2018 18:13, Ludovic Courtès wrote:

Amirouche Boubekki <address@hidden> skribis:

tl;dr: Distribution of data and software seems similar.
        Data is more and more important in software and reproducible
        science. Data science ecosystem lakes resources sharing.
        I think guix can help.


Now, whether Guix is the right tool to distribute data, I don’t know.
Distributing large amounts of data is a job in itself, and the store
isn’t designed for that.  It could quickly become a bottleneck.  That’s
one of the reasons why the Guix Workflow Language (GWL) does not store
scientific data in the store itself.

I'd say it depends on the data and how it is used inside and outside ofa workflow. Some data could very well stored in the store, and thendistributed via standard channels (Zenodo, ...) after export by "guixpack". For big datasets, some other mechanism is required.

I think it's worth thinking carefully about how to exploit guix forreproducible computations. As Lispers know very well, code is data anddata is code. Building a package is a computation like any other.Scientific workflows could be handled by a specific build system. Infact, as long as no big datasets or multiple processors are involved, wecan do this right now, using standard package declarations.

It would be nice if big datasets could conceptually be handled in thesame way while being stored elsewhere - a bit like git-annex does forgit. And for parallel computing, we could have special build daemons.


Konrad.

[Prev in Thread]

Current Thread

[Next in Thread]

Use guix to distribute data & reproducible (data) science, Amirouche Boubekki, 2018/02/09
- Re: Use guix to distribute data & reproducible (data) science, Ludovic Courtès, 2018/02/09
  - Re: Use guix to distribute data & reproducible (data) science, zimoun, 2018/02/09
  - Re: Use guix to distribute data & reproducible (data) science, Konrad Hinsen <=
    - Re: Use guix to distribute data & reproducible (data) science, zimoun, 2018/02/09
    - Re: Use guix to distribute data & reproducible (data) science, Ricardo Wurmus, 2018/02/09
    - Re: Use guix to distribute data & reproducible (data) science, Konrad Hinsen, 2018/02/12
    - Do you use packages in Guix to run neural networks?, Fis Trivial, 2018/02/13
    - Re: Do you use packages in Guix to run neural networks?, Pjotr Prins, 2018/02/14
    - Re: Do you use packages in Guix to run neural networks?, Fis Trivial, 2018/02/14
    - Re: Do you use packages in Guix to run neural networks?, Konrad Hinsen, 2018/02/14
    - Re: Use guix to distribute data & reproducible (data) science, Amirouche Boubekki, 2018/02/10
    - Re: Use guix to distribute data & reproducible (data) science, zimoun, 2018/02/10
    - Re: Use guix to distribute data & reproducible (data) science, Ludovic Courtès, 2018/02/14

Prev by Date: Re: /gnu/store/.links/
Next by Date: Re: Improving Shepherd
Previous by thread: Re: Use guix to distribute data & reproducible (data) science
Next by thread: Re: Use guix to distribute data & reproducible (data) science
Index(es):
- Date
- Thread