[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
git repositories vs. tarballs
From: |
Bruno Haible |
Subject: |
git repositories vs. tarballs |
Date: |
Mon, 15 Apr 2024 02:19:20 +0200 |
Hi Simon,
In the other thread [1][2][2a], but see also [3] and [4], you are asking
> Has this changed, so we should recommend maintainers
> to 'EXTRA_DIST = bootstrap bootstrap-funclib.sh bootstrap.conf' so this
> is even possible?
1) I think changing the contents of a tarball ad-hoc, like this, will not
lead to satisfying results, because too many packages will do things
differently.
Instead, we should ask the question "for which purposes is <an artifact>
going to be used?" or "which operations are supported on <an artifact>?".
Once there is agreement on this question, the contents of the artifact
will necessarily follow.
2) When considering
(A) git repositories (or tar.gz files containing their contents,
e.g. the "snapshot" on
https://git.savannah.gnu.org/gitweb/?p=PACKAGE.git
or the "Download ZIP" on https://github.com/TEAM/PACKAGE),
(C) a tarball as published on ftp.gnu.org,
it is also useful to consider
(E) a binary package .tar.gz / .rpm / .deb
because there is already a lot of experience for doing "reproducible
builds" from (C) to (E) [5][6].
3) So, what are the purposes of (A), (C), (E)?
So far, it has been
(A) is for users with developer skills, the preferred way to work
with the source code, including branching and merging of branches.
(C) is for users and distros, to apply relatively small modifications
and then build binaries of the package for one or more architectures,
without needing to fetch anything (other than build prerequisites)
from the network.
(E) is for users, to install the package on a specific machine, without
needing development tools.
4) What do the reproducible builds from (C) to (E) mean? The purpose of (E)
changes to
(E+) Like (E), plus:
A user _with_ development tools can determine whether (E) was
built with a published build recipe, without tampering.
Note that this requires
- formalizing the notion of a build environment [7],
- adding this build environment into (E) (not yet complete for Debian [8]).
5) There are two wishes that are not yet satisfied by (A) and (C):
(X) Many users without developer skills are turning to the git repository
and trying to build from there.
(Y) Some distros want to be able to verify the tarballs.[9] (I don't agree
with this. If you can't trust the release manager who produced the
tarballs (C), you cannot trust (A) either. If there is a mechanism
for verifying (C) from (A), criminals will commit their malware
entirely into (A).)
6) How could (X) be implemented?
The main differences between (A) and (C) are [10]:
- Tarballs contain source code from other packages.
- Tarballs contain generated files.
- Tarballs contain localizations.
I could imagine an intermediate step between (A) and (C):
(B) is for users with many packages installed and for distros, to apply
modifications (even to the set of gnulib modules) and then build
binaries of the package for one or more architectures, without
needing to fetch anything (other than build prerequisites) from the
network.
This is a different stage than (A), because most developers don't want
to commit source code from other packages into (A) — due to size — nor
to commit generated files into (A) — due to hassles with branches.
Going from (A) to (B) means pulling additional sources from the network.
It could be implemented
- by "git submodule update --init", or
- by 'npm' for JavaScript packages, or
- by 'cargo' for Rust packages [11]
and, for the localizations:
- essentially by a 'wget' command that fetches the *.po files.
The proposed name of a script that does this is 'autopull.sh'.
But I am equally open to a declarative YAML file instead of a shell script.
Going from (B) to (C) means generating files, through invocations of
gnulib-tool, bison, flex, ... for the code and groff, texinfo, doxygen, ...
for the documentation.
The proposed name of a script that does this is 'autogen.sh'.
7) How could (Y) be implemented?
Like in (E+), we would define:
(C+) Like (C), plus:
A user with all kinds of special tools can determine whether (C)
was built with a published build recipe, without tampering.
Again, this requires
- formalizing the notion of a build environment,
- adding this build environment into (C).
For example, we would need a way to specify a build dependency on a
particular version of groff or texinfo or doxygen (for the documentation),
a particular version of m4, autoconf, automake (for the configure script
and Makefile.ins).
So far, some people have published their build environment in form of
ad-hoc plain text ("This release was bootstrapped with the following tools")
inside release announcements. [12] Of course, that's the wrong place to
do so, because a user who receives (C) and wants to verify it does not
want to search for the release announcement in order to get the build
environment.
Some people are suggesting that (Y) could be implemented on top of (X) [9].
That is, the distro should start from (B), not (C). However, I think it
does not change much of the problem. The user's question "can I trust (C),
built by the package's release manager" is replaced with two questions
"can I trust (B), built by the package's release manager" and
"can I trust (C), built by the distro's build service".
Please respond with appropriately set "Subject"!! There are many topics here.
Bruno
[1] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00150.html
[2] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00163.html
[2a] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00164.html
[3] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00017.html
[4]
https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/
[5] https://reproducible-builds.org/
[6] https://wiki.debian.org/ReproducibleBuilds
[7] https://reproducible-builds.org/docs/recording/
[8] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763822
[9]
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/YWMNOEJ34Q7QLBWQAB5TM6A2SVJFU4RV/
[10] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00136.html
[11] https://doc.rust-lang.org/stable/cargo/guide/why-cargo-exists.html
[12] https://lists.gnu.org/archive/html/info-gnu/2024-01/msg00015.html
- git repositories vs. tarballs,
Bruno Haible <=