[Groff] Reproducible dates in HTML/PDF/PS output files?

From: Colin Watson
Subject: [Groff] Reproducible dates in HTML/PDF/PS output files?
Date: Wed, 27 Aug 2014 07:08:15 +0100
User-agent: Mutt/1.5.21 (2010-09-15)


Various Debian developers are working on a long-term project to ensure
that our packages can be built in a byte-for-byte reproducible way.
This makes it easier to do interesting things like defending against
people attempting to attack free software developers' systems; for
example the Tor browser has reproducible builds, allowing people to
verify that the build machine wasn't compromised.  Our project page is
here, and I gather that some other distributions are working on similar

There are all sorts of pieces to this, but one of them is either
eliminating timestamps from output files where they aren't all that
important, or making sure that they're consistent when a package is
built more than once.  As a general principle, the time when a package
was built is not information we believe should be recorded in that
package, although modification times of source files may be reasonable
things to use (for instance, we may well end up building packages under
faketime with the starting point set to the timestamp in the most recent
Debian changelog entry).  We would prefer to improve upstream build
systems to make them produce more reproducible results out of the box
where we can, since that benefits everyone rather than just Debian.

groff is a relatively small piece of this puzzle, since it's typically
not very security-sensitive, but it would still be good to make it
reproducible since it's in many distributions' base systems.  The
current sticking point I see is that grohtml, gropdf, and grops all
embed timestamps in their output files.  I have two suggestions on how
this could be improved, and would welcome feedback on which (if any) I
should submit patches for.

 1) Emit the timestamp of the source file rather than the current time
    in these devices.  The time when the source file was modified is
    typically rather more interesting than the time when the document
    processing toolchain was run.  (Possible downside: do we have to
    keep track of the maximum timestamp of any included file?)

 2) Add an option or environment variable or something to suppress the
    inclusion of timestamps.  For bonus points, set this when building
    groff's own documentation.


Colin Watson                                       address@hidden

