[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Don't put generated files to repo and trusting trust
Don't put generated files to repo and trusting trust
Tue, 11 Dec 2018 06:56:33 +0300
First I want to say something about generated files in git repo. Then I want to
say something about related, but deeper issue: trusting trust problem.
Repo has generated file src/parse-gram.c . And I think this is bad. You know
why. Because it makes git merges harder, because useless "regen" commits
appears, etc. There is well-known rule: don't put generated files to scm. Well,
you may say that removing this file will create some problems, because this
means that user should install Bison before building Bison from git. Well, yes,
he should. And what? He already should install lots of other packages when
building from git, see README-hacking. What is wrong with adding yet another
dependency? There is no any bootstrapping problems for user, because he should
simply install bison from his distro or download release tarball from gnu.org
and then build bison from git. Also, autoconf depends on itself when building
from scm, and they don't see any problems with it, look at their scm, they have
configure.ac, but don't have configure.
And finally: today there is two states: "I cloned git repo and it contains
up-to-date parse-gram.c" and "I cloned git repo and it contains not up-to-date
parse-gram.c". Well, if you simply remove parse-gram.c, then there are no this
two states anymore, and build system possibly becomes simplified. I suggest not
to add to build system rule for rebuilding parse-gram.c with just built bison,
i. e. parse-gram.c should always be built with bison installed on system. This
will simplify everything. Rule for determining whether parse-gram.c should be
rebuild becomes simply "its change time is older than change time of
You may say that changes I suggest complicate workflow when you use in
parse-gram.y features that was introduced into bison recently. Well, yes. And
that is one of the reasons why you should not use recently introduced features
in parse-gram.y in the first place.
Now I want to talk about a related issue. Bison uses Bison-generated parser and
this is wrong. Even if you choose to still use Bison-generated parser, you
should use POSIX-compatible features only and thus make sure that alternative
yacc implementations can process your parse-gram.y . Why? Well, NOT because
this self-dependency will create problems when porting to new architecture or
new operating system. It will not. Anybody who want to port Bison to new
operating system will simply grab release tarball, which already has all
generated parsers. Okey, so why then?
Well, first because this self-dependency complicates audit. If somebody wants
to verify that Bison doesn't contain any malicious code, he will have problems
with parse-gram.c . How to verify it? At first sight we should just verify
parse-gram.y and then process it to get parse-gram.c and verify that we get the
same parse-gram.c we have in release tarball. But, well, this means that we
should run untrusted bison binary for that. So, this means that the only way to
verify parse-gram.c is, well, verify this file itself. I. e. actual human
auditor will be forced to actually reading and verifying generated hard-to-read
103k parse-gram.c .
In this sense Bison is not free software at all. Why? Well, I would say that
being free software includes this: you can get it in source-only form ("source"
is something written by human, so human can read it and modify), and then build
it using this sources only and nothing else. And thus you will be sure you got
trusted binary. And so, well, Bison is not free software. (Well, of course, I
know that my understanding of free software differs from usual).
You may say, how this is possible to insert some malicious code into
parse-gram.c and not to insert it to parse-gram.y ? Well, yes, this is
possible. Read beautiful article by Ken Thompson "Reflections on Trusting
Trust". This is Turing award lecture. 1984.
Also read related article: Dr. David A. Wheeler PhD thesis:
https://dwheeler.com/trusting-trust/ . Well, despite "Fully Countering" words
in the title of thesis, keep in mind that this "Fully Countering" require that
some "trusted compiler" already available, and thus this is serious limitation
of presented method. Still, this PhD thesis gives lot of information on this
Second, some distros have policy of rebuilding all generated files. Debian has
this. https://wiki.debian.org/UpstreamGuide has this: "we need to rebuild all
generated files to make sure that they can really be built from source". This
means that in Debian bison build-depends on itself. Well, when I look at actual
build-dependencies of bison ( https://packages.debian.org/source/sid/bison ), I
don't see bison here. It seems this is a bug. I will think about it and
probably report it.
So, theoretically bison is self-build-dependent in Debian and this is bad,
because this will create a lot of problems for Debian. This means that this
package should be handled specially.
Well, okey, you will say that there is a lot packages around, which
build-depend on itself, for example, gcc. Well, yes. And this is bad. Moreover,
every operating system has its own cyclic build-dependency graph of its core
packages. And this graphs often tends to be huge and complicated. For example,
44th slide of https://www.gnu.org/software/guix/guix-els-20130603.pdf shows
this graph for GNU Guix SD. I want to note that Bison is present in this graph.
Yes, scale here is not handy to really see something at this picture, but when
I look at this slides in my browser and type Ctrl-F, I am able to find word
"bison" in this graph. Alternatively you can install guix and type "guix graph
--type=bag hello | dot -Tsvg > /graph.svg". (You can replace "hello" with "gcc"
or any other core package, the graph still will contain "bison", because you
need bison to build glibc.)
So, such big graph means problems with audit. And I think we all should take
all measures to keep this graph as small as possible. And, well, your package
got to this graph. And even if glibc devs remove dependency on bison, then
bison is still self-dependent.
Okey, what to do? Well, ideally just write usual hand-written recursive descend
parser. Or at least write yacc-compatible grammar, so that bison can be checked
using alternative yacc-compatible implementation.
And one more useful link: bootstrappable.org . And one more:
https://www.gnu.org/software/guix/manual/en/guix.html#Bootstrapping - section
on bootstrapping in GNU Guix manual.
You may say "I will not do this, but this can be valuable contribution". Well,
I don't want to do this either. But you can write to http://bootstrappable.org/
mailing list and it is possible that someone will become enthusiastic and will
rewrite this parse-gram without yacc/bison. This can be seen as simple
contribution for anyone interested in bootstrapping problems, but not so smart
to do something really hard (say, writing Haskell compiler in C++).
I saw this patch:
and it caused me to write this big letter. As well as I know api.value.type
union is not POSIX, thus this patches moves away from POSIX compatibility, and
I think we should at least roll-back it.
- Don't put generated files to repo and trusting trust,
Askar Safin <=