bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Don't put generated files to repo and trusting trust


From: Askar Safin
Subject: Don't put generated files to repo and trusting trust
Date: Tue, 11 Dec 2018 06:56:33 +0300

Hi.

First I want to say something about generated files in git repo. Then I want to 
say something about related, but deeper issue: trusting trust problem.

Repo has generated file src/parse-gram.c . And I think this is bad. You know 
why. Because it makes git merges harder, because useless "regen" commits 
appears, etc. There is well-known rule: don't put generated files to scm. Well, 
you may say that removing this file will create some problems, because this 
means that user should install Bison before building Bison from git. Well, yes, 
he should. And what? He already should install lots of other packages when 
building from git, see README-hacking. What is wrong with adding yet another 
dependency? There is no any bootstrapping problems for user, because he should 
simply install bison from his distro or download release tarball from gnu.org 
and then build bison from git. Also, autoconf depends on itself when building 
from scm, and they don't see any problems with it, look at their scm, they have 
configure.ac, but don't have configure.

And finally: today there is two states: "I cloned git repo and it contains 
up-to-date parse-gram.c" and "I cloned git repo and it contains not up-to-date 
parse-gram.c". Well, if you simply remove parse-gram.c, then there are no this 
two states anymore, and build system possibly becomes simplified. I suggest not 
to add to build system rule for rebuilding parse-gram.c with just built bison, 
i. e. parse-gram.c should always be built with bison installed on system. This 
will simplify everything. Rule for determining whether parse-gram.c should be 
rebuild becomes simply "its change time is older than change time of 
parse-gram.y".

You may say that changes I suggest complicate workflow when you use in 
parse-gram.y features that was introduced into bison recently. Well, yes. And 
that is one of the reasons why you should not use recently introduced features 
in parse-gram.y in the first place.


Now I want to talk about a related issue. Bison uses Bison-generated parser and 
this is wrong. Even if you choose to still use Bison-generated parser, you 
should use POSIX-compatible features only and thus make sure that alternative 
yacc implementations can process your parse-gram.y . Why? Well, NOT because 
this self-dependency will create problems when porting to new architecture or 
new operating system. It will not. Anybody who want to port Bison to new 
operating system will simply grab release tarball, which already has all 
generated parsers. Okey, so why then?

Well, first because this self-dependency complicates audit. If somebody wants 
to verify that Bison doesn't contain any malicious code, he will have problems 
with parse-gram.c . How to verify it? At first sight we should just verify 
parse-gram.y and then process it to get parse-gram.c and verify that we get the 
same parse-gram.c we have in release tarball. But, well, this means that we 
should run untrusted bison binary for that. So, this means that the only way to 
verify parse-gram.c is, well, verify this file itself. I. e. actual human 
auditor will be forced to actually reading and verifying generated hard-to-read 
103k parse-gram.c .

In this sense Bison is not free software at all. Why? Well, I would say that 
being free software includes this: you can get it in source-only form ("source" 
is something written by human, so human can read it and modify), and then build 
it using this sources only and nothing else. And thus you will be sure you got 
trusted binary. And so, well, Bison is not free software. (Well, of course, I 
know that my understanding of free software differs from usual).

You may say, how this is possible to insert some malicious code into 
parse-gram.c and not to insert it to parse-gram.y ? Well, yes, this is 
possible. Read beautiful article by Ken Thompson "Reflections on Trusting 
Trust". This is Turing award lecture. 1984. 
https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf .

Also read related article: Dr. David A. Wheeler PhD thesis: 
https://dwheeler.com/trusting-trust/ . Well, despite "Fully Countering" words 
in the title of thesis, keep in mind that this "Fully Countering" require that 
some "trusted compiler" already available, and thus this is serious limitation 
of presented method. Still, this PhD thesis gives lot of information on this 
topic.

Second, some distros have policy of rebuilding all generated files. Debian has 
this. https://wiki.debian.org/UpstreamGuide has this: "we need to rebuild all 
generated files to make sure that they can really be built from source". This 
means that in Debian bison build-depends on itself. Well, when I look at actual 
build-dependencies of bison ( https://packages.debian.org/source/sid/bison ), I 
don't see bison here. It seems this is a bug. I will think about it and 
probably report it.

So, theoretically bison is self-build-dependent in Debian and this is bad, 
because this will create a lot of problems for Debian. This means that this 
package should be handled specially.


Well, okey, you will say that there is a lot packages around, which 
build-depend on itself, for example, gcc. Well, yes. And this is bad. Moreover, 
every operating system has its own cyclic build-dependency graph of its core 
packages. And this graphs often tends to be huge and complicated. For example, 
44th slide of https://www.gnu.org/software/guix/guix-els-20130603.pdf shows 
this graph for GNU Guix SD. I want to note that Bison is present in this graph. 
Yes, scale here is not handy to really see something at this picture, but when 
I look at this slides in my browser and type Ctrl-F, I am able to find word 
"bison" in this graph. Alternatively you can install guix and type "guix graph 
--type=bag hello | dot -Tsvg > /graph.svg". (You can replace "hello" with "gcc" 
or any other core package, the graph still will contain "bison", because you 
need bison to build glibc.)

So, such big graph means problems with audit. And I think we all should take 
all measures to keep this graph as small as possible. And, well, your package 
got to this graph. And even if glibc devs remove dependency on bison, then 
bison is still self-dependent.


Okey, what to do? Well, ideally just write usual hand-written recursive descend 
parser. Or at least write yacc-compatible grammar, so that bison can be checked 
using alternative yacc-compatible implementation.


And one more useful link: bootstrappable.org . And one more: 
https://www.gnu.org/software/guix/manual/en/guix.html#Bootstrapping - section 
on bootstrapping in GNU Guix manual.


You may say "I will not do this, but this can be valuable contribution". Well, 
I don't want to do this either. But you can write to http://bootstrappable.org/ 
mailing list and it is possible that someone will become enthusiastic and will 
rewrite this parse-gram without yacc/bison. This can be seen as simple 
contribution for anyone interested in bootstrapping problems, but not so smart 
to do something really hard (say, writing Haskell compiler in C++).


I saw this patch: 
http://git.savannah.gnu.org/cgit/bison.git/commit/?id=3ae81aa338fb08be451f7ed106adf94e35f52e15
 and it caused me to write this big letter. As well as I know api.value.type 
union is not POSIX, thus this patches moves away from POSIX compatibility, and 
I think we should at least roll-back it.

==
Askar Safin
http://vk.com/safinaskar

reply via email to

[Prev in Thread] Current Thread [Next in Thread]