[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Conda environments and reproducibility

From: Hugo Buddelmeijer
Subject: Re: Conda environments and reproducibility
Date: Fri, 2 Dec 2022 15:06:20 +0100

Hi Ludovic,

On Fri, 2 Dec 2022 at 12:05, Ludovic Courtès <> wrote:

I read this thread with interest—great to have first-hand feedback from
Conda users and packagers who also understand Guix!

Hugo Buddelmeijer <> skribis:

> That is, "conda env export" should contain entries like
> "scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define the
> dependencies 'that matter', like which compiler is used. What goes into the
> hash seems rather complicated, and grows over time.

I think one source of many problems here is to think that there are
dependencies that do not matter. 

In the Python world, most dependencies are runtime dependencies. Those do not actually affect the build, or the build result, and therefore arguably 'do not matter'. (I disagree, because what matters is whether the software runs and creates the right results.)
Another one, which those hashes appear
to address, is to think that a name/version pair is enough to
unambiguously designate a software artifact.

This hash is a hash of the build result, not a hash of the input, is
that correct?

No, this conda build hash is used to identify the build environment, not to identify a particular package build.

The easiest way to explain is to show an example. Here is a small part of a "conda env export" of one of my environments:
  - pybind11-abi=4=hd8ed1ab_3
  - pycodestyle=2.8.0=pyhd8ed1ab_0
  - pycosat=0.6.3=py39h3811e60_1009
  - pycparser=2.21=pyhd8ed1ab_0
  - pydocstyle=6.1.1=pyhd8ed1ab_0
  - pyerfa=
  - pyflakes=2.4.0=pyhd8ed1ab_0
  - pygments=2.11.2=pyhd8ed1ab_0
  - pyopenssl=22.0.0=pyhd8ed1ab_0
  - pyqt=5.12.3=py39hf3d152e_8
  - pyqt-impl=5.12.3=py39hde8b62d_8
  - pyqt5-sip=4.19.18=py39he80948d_8
  - pyqtchart=5.12=py39h0fcd23e_8
  - pyqtwebengine=5.12.1=py39h0fcd23e_8

As you see, many packages share the "hd8ed1ab" build hash, two qt-related packages have h0fcd23e, and some others have their own. The "hd8ed1ab" hash is by far the most common in this environment. These "hd8ed1ab" packages are mostly independent (with separate maintainers, etc), but are probably all in conda-forge and probably all use the 'default' conda environment.

(The last digit/number is the build number. The "8" suggests that all qt-packages are actually built together, even though their build hash differs.)

I don't really understand what goes into the hash. It is described on

The goal of these hashes is to capture which package builds will work together. So two package builds with the same build-hash should have been made with the same environment and thus work together.

I'm not sure how it works if the hashes are different. Maybe they are merkle trees? So it is possible to determine whether one hash is a 'superset' of another hash. Probably not.

I think it would be great to have a blog post that walks through
shortcomings and concrete issues one may encounter when trying to
reproduce a software environment with Conda, contrasting it with how
Guix does thing.  This would probably make more sense for people who use
Conda everyday than a high-level overview of Guix.

A key difference might be how to handle different combinations of versions.

E.g. you might want to use numpy 3.0 and scipy 18.0, while I want to use numpy 6.0 and scipy 15.0 (made up numbers, but on purpose with one lower and one greater between us). Conda and Guix solve this in fundamentally different ways.

Conda-forge (as a project) is kinda in between conda alone and Guix, and can kinda be seen as a linux distribution itself (sans kernel). Conda forge is moving closer to Guix every year, including more and more dependencies, and more shared recreate-everything moments.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]