[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intrinsic vs extrinsic identifier: toward more robustness?

From: Simon Tournier
Subject: Re: intrinsic vs extrinsic identifier: toward more robustness?
Date: Thu, 06 Apr 2023 14:15:56 +0200


On jeu., 16 mars 2023 at 18:45, Ludovic Courtès <> wrote:

>> For sure, we have to fix the holes and bugs. :-)  However, I am asking
>> what we could add for having more robustness on the long term.

> Sources (fixed-output derivations) are already content-addressed, by
> definition (I prefer “content addressing” over “intrinsic
> identification” because that’s a more widely recognized term).

This is the case when you consider that the result of the fixed-output
derivation is already inside the Guix “ecosystem”…

> In a way, like Maxime way saying, the URL/URI is just a hint; what
> matters it the content hash that appears in the origin.

…but else URL/URI is not just a “hint“.  Or could you explain what you
mean by a “hint”?

Maybe I misunderstand something, from my understanding, URL/URI is a
“hint” only when substitutes is available, else Guix relies on plain
URL/URI for fetching data.

--8<---------------cut here---------------start------------->8---
$ guix build hello -S --no-substitutes --check
The following derivation will be built:
building /gnu/store/3hxraqxb0zklq065zjrxcs199ynmvicy-hello-2.12.1.tar.gz.drv...

Starting download of 
following redirection to 
downloading from ...

warning: rewriting hashes in 
`/gnu/store/3dq55rw99wdc4g4wblz7xikc8a2jy7a3-hello-2.12.1.tar.gz'; cross fingers
--8<---------------cut here---------------end--------------->8---

Other said, when speaking about robustness (broad meaning), I think we
cannot assume that the “content addressing” provided by the derivation,

--8<---------------cut here---------------start------------->8---
   ,("impureEnvVars","http_proxy https_proxy LC_ALL LC_MESSAGES LANG COLUMNS")
--8<---------------cut here---------------end--------------->8---

is still there and instead it would mean Guix has to rely on another
system (here ’url’).  Somehow, I am proposing to optionally add more
“content addressing” than the current NAR+SHA256 (and URL/URI) to then
be able to exploit other “content addressing“ systems.

> So it seems to me that the basics are already in place.

Well, there is two possible choices: (1) rely on an external service
that would be bridge the different content addressing systems (as
extending the Disarchive database or hope SWH will do it :-)) but this
other external service needs to be always available or (2) extend the
information of packages (optional fields, etc.).

Moreover about (1), all third-party channels would have to be ingested
by this external service.  About SWH, that’s possible.  About Disarchive
database, it would mean register this third-party channel or maintain
their own database.  Contrary to (2) where the identifier would be
optionally part of the package definition.

> What’s missing, both in SWH and in Guix, is the ability to store
> multiple hashes.  SWH could certainly store several hashes, computed
> using different serialization and hash algorithm combinations.

Please note that currently Guix relies on a “hint“ when SWH is used as
fallback.  For instance, consider most of the cases of git-fetch, Guix
provides to the SWH API the context (URL and Git tag) and let SWH
resolves in order to find the content addressing identifier.  It works
for many cases but it fails for history of history cases, e.g., when
upstream does in-place tag replacement.

And this strategy does not work with Subversion (svn-fetch) or Mercurial
(hg-fetch) or else.  It requires more work on our side (parse the result
of the query, extract relevant information etc.).  Nothing impossible
but far to be done, IMHO. :-)

Well, I still have mixed feelings about the SWH fallback robustness. :-)

> This is what you suggested at
> <>; it was
> also discussed in the thread at
> <>.  It
> would be awesome if SWH would store Nar hashes; that would solve all our
> problems, as you explained.

Yeah that’s nice. :-)  The progress is tracked by,

and the first part for computing NAR is now merged, IIUC, with:

However, exposing via their API this NAR and then bridging NAR -> swhid
is not planned on SWH side yet, AFAIK.

> The other option—storing multiple hashes for each origin in Guix—doesn’t
> sound practical: I can’t imagine packages storing and updating more than
> one content hash per package.  That doesn’t sound reasonable.  Plus it
> would be a long-term solution and wouldn’t help today.

Storing a list of content addressing identifiers (NAR+SHA256, Git+SHA1,
GNUnet, IPFS, etc.) would allow to add robustness, IMHO.

Other said, it is not affordable to have a ’gnunet-fetch’ method as
proposed in [1] but we could optionally have,

       (method url-fetch)
       (uri (string-append "mirror://gnu/hello/hello-" version
         (git+sha1 "swh:1:dir:013573086777370b558b1a9ecb6d0dca9bb8ea18")
         (none+sha1 "8f261739d33d31867ab9c5fa26f973c37da26ca5"))))

And we could also have Git commit hash (for packages using git-fetch
method), etc.

Having an optional field ’identifiers’ would allow to help today for all
other fetch methods than url-fetch and git-fetch.

For sure, it is not straightforward.  For instance, how to insure the
consistency?  Via “guix lint”?  Else? 

Well, on the other hand, sometimes I would like to have a list of
sources using different fetch method, say try first using this url-fetch
and then this git-fetch and then this SWH fallback, etc.

To me the other viable option would be to extend the Disarchive database
and services around.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]