guix-science
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Conda environments and reproducibility


From: Simon Tournier
Subject: Re: Conda environments and reproducibility
Date: Fri, 02 Dec 2022 14:59:42 +0100

Hi,

On Fri, 02 Dec 2022 at 12:05, Ludovic Courtès <ludo@gnu.org> wrote:
> Hugo Buddelmeijer <hugo@buddelmeijer.nl> skribis:
>
>> That is, "conda env export" should contain entries like
>> "scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define the
>> dependencies 'that matter', like which compiler is used. What goes into the
>> hash seems rather complicated, and grows over time.
>
> I think one source of many problems here is to think that there are
> dependencies that do not matter.  Another one, which those hashes appear
> to address, is to think that a name/version pair is enough to
> unambiguously designate a software artifact.
>
> This hash is a hash of the build result, not a hash of the input, is
> that correct?

Well, the official Conda documentation seems explanatory, IMHO.  For
instance,

https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/solvers.html#matchspec-vs-packagerecord

>From my understanding, if you go via MatchSpec then the SAT solver is
invoked.  The SAT solver tries to satisfy all the constraints and the
solution depends on the state of the index (the upstream repository).

Aside the SAT solver can be very long and even fails if the constraints
are too hard, there is no guarantee that the SAT solver will find the
exact same combination for the packages to install.  Having an equality
(numpy=1.23) or something else does not really change this point.

Conda offers the option to be “explicit”.  And in that case, the solver
is not invoked.  Somehow, it is a way to directly deal with
PackageRecord.  Then, the Conda documentation has these warnings:

        * Explicit package installs

        Since  the  solver is  not  involved,  the dependencies  of  the
        explicit package(s) are not processed at all. This can leave the
        environment  in an  inconsistent state,  which can  be fixed  by
        running conda update --all, for example.

        * Cloning an environment

        It essentially takes the  source environment, generates the URLs
        for  each installed  packages  (filtering  conda, conda-env  and
        their   dependencies)   and  passes   the   list   of  URLs   to
        explicit(). If the source tarballs are not in the cache anymore,
        it will  query the  index for  the best  possible match  for the
        current channels. As  such, there’s a slim chance  that the copy
        is not exactly a clone of the original environment.

        
https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/solvers.html#early-exit-tasks


Therefore, the official Conda documentation explains that it is not
possible to have some guarantee about reproducing an environment.


> I think it would be great to have a blog post that walks through
> shortcomings and concrete issues one may encounter when trying to
> reproduce a software environment with Conda, contrasting it with how
> Guix does thing.  This would probably make more sense for people who use
> Conda everyday than a high-level overview of Guix.

>From my understanding, the main issue is that Conda perfectly works when
you are in a short temporal window (2-3 months, say!).  In this range,
people can often reproduce.  It becomes more complicated outside this
range – so it is hard to demo for explaining. :-)

For sure, a blog post by people being fluent in both Conda and Guix
would be very welcome.  Aside the discussion about reproducibility, just
a Rosetta Stone comparing how to do that using Conda vs Guix.  It would
smooth the migration and at least give a try with Guix. :-)


Cheers,
simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]