guix-science
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Conda environments and reproducibility


From: Thibault Lestang
Subject: Re: Conda environments and reproducibility
Date: Tue, 29 Nov 2022 10:41:37 +0000
User-agent: mu4e 1.6.10; emacs 28.1

Simon Tournier <zimon.toutoune@gmail.com> writes:

> On Mon, 28 Nov 2022 at 17:28, Thibault Lestang <t.lestang@imperial.ac.uk> 
> wrote:
>> -----
>> @luispedrocoelho
>> Me, 6 months ago: I am going to save this conda
>> environment with all the versions of all the packages so it can be
>> recreated later; this is Reproducible Science!
>>
>> conda, today: these versions don't work together, lol.
>> -----
>>
>> I simply can't explain how such a behavior can happen.
>
> One thing is the link rot.  I do not know if it is currently estimated,
> but for sure, we always underestimate it.

How far back do packages version go in Anaconda's archives? Are there
any guarantees? Good question.

>> I understand that conda ships pre-compiled binaries. I see how that's
>> bad for reproducibility and provenance tracking since it's not
>> straightforward to know how these binaries and dependencies were
>> compiled. I'm assuming that, when conda saves an environment, it records
>> version tags and "everything else required" to pull the same binaries
>> later. Okay - I see how binaries could /technically/ be modified at a
>> later stage whilst maintaning the same version tag (provenance tracking
>> issue).
>
> Aside, you are assuming the availability of such binaries. :-)

Yes I am - I guess that's linked to your point about link rot?
>
> Another thing, from the old time where I used Conda, and I may be wrong,
> is, I guess , the SAT solver [1].  Well, 6 months ago, you described
> your environment, for instance saying:
>
>     1.0 <= foo
>     2.0 <= bar <= 3.0
>     baz <= 4.0
>
> then foo@1.1, foo@1.2 and foo@2.0 had been released in these past 6
> months.  But baz <= 4.0 only works with 0.9 <= foo <= 1.2 and the
> constraint on bar implies other constraints on foo and/or baz.
>
> The complexity about SAT solvers is exponential, IIRC, for sure really
> bad, and I do not know the state-of-the-art but I guess the problem to
> solve is going to be worse and worse as the time flies.
>
> From my experience, you have only one solution to fight against the
> time: freeze.  The question is then how or what to freeze. :-)
>
> One way for freezing is the binary container.  Another way for freezing
> is to have a “summary” capturing the whole (fixed) graph of
> dependencies.  This is (usually named) the channels.scm file (guix
> describe).  Then, the assumptions become:
>
>  1. solve the link rot; tackled by Software Heritage,
>  2. Linux kernel API backward compatibility,
>  3. hardware compatibility,

I think the tweet above is about reproducing an enviroment after
effectively freezing constitutive packages and their dependenies as you
describe. They probably used something like

conda env export

Which outputs something similar to (trimmed)

name: justnumpy
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - blas=1.0=mkl
  - libuuid=1.41.5=h5eee18b_0
  - mkl=2021.4.0=h06a4308_640
  - mkl-service=2.4.0=py310h7f8727e_0
  - mkl_fft=1.3.1=py310hd6ae3a3_0
  - mkl_random=1.2.2=py310h00e6091_0
  - ncurses=6.3=h5eee18b_3
  - numpy=1.23.4=py310hd5efca6_0
  - numpy-base=1.23.4=py310h8e6c178_0
  - ...
  - ...
prefix: /home/thibault/miniconda3/envs/justnumpy

> If I might, here some stuff: :-)
>
> https://www.nature.com/articles/s41597-022-01720-9
> https://simon.tournier.info/posts/2022-11-08-bluehats.html
> https://simon.tournier.info/posts/2022-04-15-cafe-guix-long-term.html

Great stuff - thank you. Congratulations on the paper!

-- Thibault



reply via email to

[Prev in Thread] Current Thread [Next in Thread]