[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
3 kinds of bootstrap (was Re: backdoor injection via release tarballs co
From: |
Simon Tournier |
Subject: |
3 kinds of bootstrap (was Re: backdoor injection via release tarballs combined with binary artifacts) |
Date: |
Tue, 07 May 2024 20:22:22 +0200 |
Hi,
I am late to the party…
On mer., 10 avril 2024 at 15:57, Ludovic Courtès <ludo@gnu.org> wrote:
>> That has happened to me too.
>> Why not use Git directly always?
>
> Because it create{s,d} a bootstrapping issue. The
> “builtin:git-download” method was added only recently to guix-daemon and
> cannot be assumed to be available yet:
>
> https://issues.guix.gnu.org/65866
[...]
> I think we should gradually move to building everything from
> source—i.e., fetching code from VCS and adding Autoconf & co. as inputs.
>
> This has been suggested several times before. The difficulty, as you
> point out, will lie in addressing bootstrapping issues with core
> packages: glibc, GCC, Binutils, Coreutils, etc. I’m not sure how to do
> that but…
[...]
> … live-bootstrap can probably be a good source of inspiration to find a
> way to build those core packages (or some of them) straight from a VCS
> checkout.
IMHO, we need to distinguish because there is different types of issues
and thus different potential workarounds. :-)
1. Bootstrap how to download source code.
2. Bootstrap how to build core packages.
3. Bootstrap the driver (say guix-daemon and helpers).
Well, having solutions for #1 and #3 would naturally provide a solution
for #2. Although the devil is about details. ;-)
About #1
========
You cannot use the binary ’git’ in order to download the source code of
Git to build the binary ’git’. Yeah, circular dependency. :-)
Therefore, Git source code is pulled using another method, say from
tarball, such method which also needs to be built from source, so it
also needs yet another method. The usual chicken-or-the-egg problem.
The current workaround is to “hide” the problem and introduce a
“builtin:download” method: it’s an “opaque” binary that is hard to
inspect. Roughly, the workaround had been introduced by [1] on
Oct. 2016. Almost 8 years ago, so it works! :-)
The argument for accepting this “opaque” method is because it is a
fixed-output derivation. Other said, we know beforehand the SHA256
checksum. Thus the claim is: being “opaque” does not matter because the
SH256 checksum can be computed independently and all the source code can
be audited.
For cutting another cycle, another “opaque” had be introduced:
“builtin:git-download”. All applies similarly.
Do not take me wrong with “opaque”. I mean that the method depends on
the couple user-revision and daemon-revision. Other said, it is not
straightforward to know when Alice and Bob are using the exact same
method for downloading source code. Since it is not fully transparent,
it is “opaque”. :-)
Somehow we are applying to all what we need for cutting a specific
circular dependency. We have some packages named ’foo-bootstrap’ that
are aimed to solve some dependency problem about packages, then we do
not use them for all; we just use them for cutting a circular
dependency. I think a similar strategy should be applied for the fetch
methods.
We could have “git-fetch” relying on the initial Git method, i.e., a
transparent derivation where it’s straightforward to audit all: the
dependencies and the builder.
And for some specific cases, we could have “git-fetch/bootstrap” relying
on “builtin:git-download”. It eases to know which packages are very
important to care.
I think that “builtin:download” and “builtin:git-download” applied to
all “url-fetch” and “git-fetch” both downgrade the complete transparency
level for solving very specific bootstrapping problem.
Last about #1, please note that the transparency does not come for free
and has drawbacks: when running say “guix time-machine -C past.scm --
build -S”, all the dependencies for downloading would be the ones of
past.scm. Other said, for downloading today the source code of a 5
years old package, say using ’hg-fetch’, we need Python and Mercurial as
they were 5 years ago – when we do not expect any difference on the
content with the Python and Mercurial of today.
About #3
========
That’s the very hard topic! The bootstrapping story is not fully done
yet.
Assuming trust for #1, the bootstrap of Guix starts with
’bootstrap-seeds’, roughly 232KiB. Take a moment, that’s impressive, :-)
right?
Obviously, I let aside Haskell, Ocaml@5 etc.
Well, diving further. These 232K alone are not enough. It also
requires helpers: tar (1.3MiB), bash (1.3MiB), mkdir (0.7MiB) and xz
(0.844MiB).
More, it requires two drivers: static Guile binary (14MiB) and
guix-daemon.
You get it: How to trust these helpers? Two approaches: (a) implement
something directly in hex/assembler and/or (b) exploit the Guile binary
(à la Scheme on bare metal).
About guix-daemon, one solution is a daemon directly in Guile, and
compatible with the very Guile binary. Or at least, a minimalist daemon
with just enough features for building up to guix-daemon.
Or another option is the “Extreme bootstrapping” [3] – my understanding
of live-bootstrap. Somehow, remove guix-daemon from the picture and
convert the derivation – the one read by guix-daemon – to a minimal
Guile script that would be executed during startup. See the
proof-of-concept in the branch wip-system-bootstrap [4].
Just my lengthy opinion… Or maybe some ideas for GSoC. ;-)
1: https://issues.guix.gnu.org/22774#3
2:
https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down
3: https://guix.gnu.org/en/blog/2019/reproducible-builds-summit-5th-edition
4: https://git.savannah.gnu.org/cgit/guix.git/log/?h=wip-system-bootstrap
Cheers,
simon
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- 3 kinds of bootstrap (was Re: backdoor injection via release tarballs combined with binary artifacts),
Simon Tournier <=