bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

git repositories vs. tarballs


From: Bruno Haible
Subject: git repositories vs. tarballs
Date: Mon, 15 Apr 2024 02:19:20 +0200

Hi Simon,

In the other thread [1][2][2a], but see also [3] and [4], you are asking

> Has this changed, so we should recommend maintainers
> to 'EXTRA_DIST = bootstrap bootstrap-funclib.sh bootstrap.conf' so this
> is even possible?

1) I think changing the contents of a tarball ad-hoc, like this, will not
   lead to satisfying results, because too many packages will do things
   differently.

   Instead, we should ask the question "for which purposes is <an artifact>
   going to be used?" or "which operations are supported on <an artifact>?".
   Once there is agreement on this question, the contents of the artifact
   will necessarily follow.

2) When considering
     (A) git repositories (or tar.gz files containing their contents,
         e.g. the "snapshot" on 
https://git.savannah.gnu.org/gitweb/?p=PACKAGE.git
         or the "Download ZIP" on https://github.com/TEAM/PACKAGE),
     (C) a tarball as published on ftp.gnu.org,
   it is also useful to consider
     (E) a binary package .tar.gz / .rpm / .deb
   because there is already a lot of experience for doing "reproducible
   builds" from (C) to (E) [5][6].

3) So, what are the purposes of (A), (C), (E)?

   So far, it has been
     (A) is for users with developer skills, the preferred way to work
         with the source code, including branching and merging of branches.
     (C) is for users and distros, to apply relatively small modifications
         and then build binaries of the package for one or more architectures,
         without needing to fetch anything (other than build prerequisites)
         from the network.
     (E) is for users, to install the package on a specific machine, without
         needing development tools.

4) What do the reproducible builds from (C) to (E) mean? The purpose of (E)
   changes to
     (E+) Like (E), plus:
          A user _with_ development tools can determine whether (E) was
          built with a published build recipe, without tampering.
   Note that this requires
     - formalizing the notion of a build environment [7],
     - adding this build environment into (E) (not yet complete for Debian [8]).

5) There are two wishes that are not yet satisfied by (A) and (C):
   (X) Many users without developer skills are turning to the git repository
       and trying to build from there.
   (Y) Some distros want to be able to verify the tarballs.[9] (I don't agree
       with this. If you can't trust the release manager who produced the
       tarballs (C), you cannot trust (A) either. If there is a mechanism
       for verifying (C) from (A), criminals will commit their malware
       entirely into (A).)

6) How could (X) be implemented?

   The main differences between (A) and (C) are [10]:
     - Tarballs contain source code from other packages.
     - Tarballs contain generated files.
     - Tarballs contain localizations.

   I could imagine an intermediate step between (A) and (C):

     (B) is for users with many packages installed and for distros, to apply
         modifications (even to the set of gnulib modules) and then build
         binaries of the package for one or more architectures, without
         needing to fetch anything (other than build prerequisites) from the
         network.

   This is a different stage than (A), because most developers don't want
   to commit source code from other packages into (A) — due to size — nor
   to commit generated files into (A) — due to hassles with branches.

   Going from (A) to (B) means pulling additional sources from the network.
   It could be implemented
     - by "git submodule update --init", or
     - by 'npm' for JavaScript packages, or
     - by 'cargo' for Rust packages [11]
   and, for the localizations:
     - essentially by a 'wget' command that fetches the *.po files.

   The proposed name of a script that does this is 'autopull.sh'.
   But I am equally open to a declarative YAML file instead of a shell script.

   Going from (B) to (C) means generating files, through invocations of
   gnulib-tool, bison, flex, ... for the code and groff, texinfo, doxygen, ...
   for the documentation.

   The proposed name of a script that does this is 'autogen.sh'.

7) How could (Y) be implemented?
   Like in (E+), we would define:

     (C+) Like (C), plus:
          A user with all kinds of special tools can determine whether (C)
          was built with a published build recipe, without tampering.

   Again, this requires
     - formalizing the notion of a build environment,
     - adding this build environment into (C).

   For example, we would need a way to specify a build dependency on a
   particular version of groff or texinfo or doxygen (for the documentation),
   a particular version of m4, autoconf, automake (for the configure script
   and Makefile.ins).

   So far, some people have published their build environment in form of
   ad-hoc plain text ("This release was bootstrapped with the following tools")
   inside release announcements. [12] Of course, that's the wrong place to
   do so, because a user who receives (C) and wants to verify it does not
   want to search for the release announcement in order to get the build
   environment.

   Some people are suggesting that (Y) could be implemented on top of (X) [9].
   That is, the distro should start from (B), not (C). However, I think it
   does not change much of the problem. The user's question "can I trust (C),
   built by the package's release manager" is replaced with two questions
     "can I trust (B), built by the package's release manager" and
     "can I trust (C), built by the distro's build service".

Please respond with appropriately set "Subject"!! There are many topics here.

Bruno

[1] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00150.html
[2] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00163.html
[2a] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00164.html
[3] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00017.html
[4] 
https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/
[5] https://reproducible-builds.org/
[6] https://wiki.debian.org/ReproducibleBuilds
[7] https://reproducible-builds.org/docs/recording/
[8] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763822
[9] 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/YWMNOEJ34Q7QLBWQAB5TM6A2SVJFU4RV/
[10] https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00136.html
[11] https://doc.rust-lang.org/stable/cargo/guide/why-cargo-exists.html
[12] https://lists.gnu.org/archive/html/info-gnu/2024-01/msg00015.html






reply via email to

[Prev in Thread] Current Thread [Next in Thread]