[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Deep vs Shallow trace: Removing the tradeoff?
From: |
Julien Lepiller |
Subject: |
Re: Deep vs Shallow trace: Removing the tradeoff? |
Date: |
Sat, 27 Mar 2021 20:50:56 -0400 |
User-agent: |
K-9 Mail for Android |
Le 27 mars 2021 12:56:08 GMT-04:00, ilmu <ilmu@rishi.is> a écrit :
>Hi,
>
>I had this idea while reading about authenticated datastructures, I
>would be very thankful if you could tell me why it doesn't work.
>
>Bear in mind the following disclaimer as you read this:
>
>My experience with these things is mostly theoretical, I have never
>used Bazel and although I was a user of Nix and am now moving to Guix I
>have not contributed to nixpkgs and only written simple expressions.
>
>Without further ado.
>
>
>The premise I am assuming is the framework introduced in Build Systems
>a la Carte. They talk about Bazel and Nix as representatives of two
>different dependency tracing strategies:
>
>- Shallow tracing :: You hash immediate dependencies.
>- Deep tracing :: You hash the whole transitive closure.
>
>Now the tradeoff is basically the following:
>
>- In Nix when you put a comment in curl you need to rebuild every
>single package in nixpkgs because they more or less all depend on curl
>in one way or another and therefore the curl source is in the
>transitive closure for almost every package.
>- In Bazel when you put a comment in curl then the direct dependents
>need to be rebuilt but if they are the same as before after being
>rebuilt then the propagation is killed and nothing else needs to
>change.
>
>However, in Bazel you will need to traverse the whole dependency tree
>all of the time to verify that everything is as it should be.
What I understand from the above is that their is not so much of a difference
between Bazel and Nix/Guix. The "shallow" tracing *is* a deep tracing. It just
records something different.
In Nix/Guix, we recursively compute the hash of packages from their inputs,
receipe and other stuff. This hash is unrelated to the checksum of the result
and changes as soon as one element changes. It essentially records the whole
source dependency graph.
From what you write, Bazel records the checksum of inputs and receipe, so it
essentially encodes the whole binary dependency graph. It's not "shallow", as a
real change down the graph can have repercussions higher up.
Actually Eelco described something similar for Nix in his thesis, the
intensionnal model. As you noticed, different receipes or inputs can lead to
the same output (bitwise). The idea is to declare the expected checksum of the
result, and use this to compute the store item name. As long as you apply
changes that do not affect the output, different receipes (updates) can be used
to generate the same item. So if you update a package, it might affect its
output checksum, so you need to change it. Dependents might not be affected, in
which case you can block update propagation there.
The question being how to monitor that packages need to be rebuilt or have the
same hash?
>
>
>Now the idea I have is very simple:
>
>We use recursive zero knowledge proofs with shallow traces, the rzkp
>caches the traversal and provides the same guarantee as the deep traces
>do (transitive closure is verified to be as it should be). Now if
>someone puts a comment in curl there is a small amount of packages that
>need to be rebuilt and then we redo only the proofs all the way up.
>This way we save ourselves a potentially massive amount of compilation.
I don't really understand why it must be zero-knowledge. My understanding is
that to have a zero-knowledge proof, you first need a proof of the claim, then
find a clever way to proove you have the proof, without disclosing it. Not sure
it's helpful here. If you have a proof that you don't need to rebuild, why not
share that proof to everyone in the first place?
>
>As I said before I do not have much experience with the real
>implementations of these ideas so I am sure this is not as simple as it
>is in my head. However the distri experimental operating system (which
>implements a similar model to guix and nixos) does not put the hash in
>the store path but rather keeps a small metadata file for each path and
>then has a natural number suffix for the path of concurrent versions of
>the same package. This gives a better UX imho and is probably also
>easier to extend with more appropriate authenticated datastructures as
>they are discovered.
>
>
>I hope I am not a raving madman and that this actually makes at least a
>slight amount of sense. Very much looking forward to takedowns :)
>
>
>Kind regards,
>- Ilmu