Content addressable store

On IRC at May. 14. 2019, the topic of content addressable store idea was discussed.

This is also discussed here:

page 143 of https://nixos.org/~eelco/pubs/phd-thesis.pdf (or 135) on the intentional model

and

https://github.com/NixOS/nix/issues/296.

Thanks for the links roptat.

So, after reading this an initial idea came up, which looks like this:

1. solve the content addressability problem like proposed in the thesis:

- build the derviation like we do it now

- rewrite the self-references to a known constant

- compute the hash after the rewrite

- relocate the package to the store-path indicated by the new hash

2. after the packager builds the package, the content address can be added to the definition

3. fail tha package build, if it has a content address, but it mismatches the produced artifact.

4. use flags to allow installing to the original path, and to the content addressed path.

I propose to default these in such a way, that it installs to the original path if no content

address specified, and to install to the content addressed path, if the content address is specfied.

(This might come in hand in the transitional period, so that we can install the package to both locations)

There are two issues with the approach:

1. only reproducible packages can be content addressed

2. when a package has a content address, then it will be resolved to that in the dependents, opening up the possibility, that the package points to the output of another derivation than the one defined in the package. As per discussion a user using a channel trust the channel code, it was concluded, that malicious injection can be ignored. What might still happen, is that upon updating a package, the content address is not modified, so the dependents still resolve to the old content address, and have no way of knowing, that the package definition does not actually build. With proper workflow support this might be manageable.

Benefits of this approach:

- the content addresses do not need a centralized database

- the complications resulting from derivations building to different outputs is eliminated

- a very good reproducibility indicator is gained

- it can peacfully coexist with our current store.

Wdyt?

From:	Gábor Boskovits
Subject:	Content addressable store
Date:	Wed, 15 May 2019 10:33:18 +0200