[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using hash instead of timestamps to check for changes.

From: Edward Welbourne
Subject: Re: Using hash instead of timestamps to check for changes.
Date: Thu, 02 Apr 2015 17:48:35 +0000

> After reading over your mail a couple of times, I realized that I hadn't
> thought things through very well.  In fact, rather than saying "hash
> instead of time", I should have said "optional additional hash check
> when timestamp has changed".

Even so, I'm unclear about why "hash" is the thing you want here.  You
anticipate saving lots of time on builds, presumably when immaterial
changes get ignored, or when the only change is to a timestamp.  (The
latter could be fixed by touch -t if it's really important.)  The
situations I've seen where that felt like it might happen have been
where some intermediate files often don't change in response to changes
in the files from which they're generated, much as a change to a comment
doesn't change the result of compiling code.

Some colleagues wrote tools with the superficially nice behaviour that,
when about to write a file, they would check to see whether it was
changed from what's already on disk; if it was unchanged, they would not
overwrite the file.  This saved regenerating files dependent on the
output file; but had the drawback that the file would stay out of date
relative to those on which *it* depended, so got remade every time we
ran make (once an irrelevant change had happened upstream).

The problem with any "is this change material" check, to evade doing
downstream build steps, is that you have to do the check on every make
run, once there is a maybe-material change present that it's saving you
from responding to.  You can use a timestamp check as a cheap pre-test
to that (file hasn't changed since last time, so can't contain a
material change) but once it *has* saved you doing some downstream work,
you are doing some checking that you must repeat each time make runs.
Something depends on something that's newer, somewhere in your
dependency tree, forcing make to re-run some of your rules, albeit these
work out that they should do a no-op.

My ideal solution to this would be to have an extra timestamp as part of
the file-system's meta-data: "up to date at" as distinct from "created"
and "modified".  (To make it generic, rather than make-specific, I'd
probably call it "validated" or some such.)  If we had this, make could
compare it, on each generated file, with "modified" on its
prerequisites; a file is out of date if a prerequisite has been modified
since it was up to date.  When regenerating a file, we could then see
whether it has changed; if it hasn't, we leave "modified" alone and
update "up to date" to the present; otherwise, we over-write the file
and change both.  I think this would do most of what I suspect you
really want.  However, file-systems don't have an extra time-stamp for
us to use in this way, so we can't do this.

Of course, we could abuse the existing time-stamps to achieve this; I
find "created" an almost useless datum - many tools create a new file to
replace the old one when "modifying", renaming the new one on success,
so the old version's creation time is forgotten and "create" is mostly
synonymous with "modify".  If we could assume that of all tools, we
could then use "creation" time as "modified" in the above and use
"modified" time as the "up to date at" time and all would swim nicely.
However, I suppose some tools really do modify files in place, so would
leave "created" unchanged while revising "modified"; so I doubt this
scheme would fly (and it *is* an abuse of the defined time-stamps).  I
dare say others can think of other problems with it.

I spent a few hours trying to work out how to fake this up with a
secondary file whose "modified" time-stamp serves as "up-to-date" for
the primary it represents.  It might contain a hash or other meta-data
as you describe.  For files fully under make's control (generated files)
this looks feasible - albeit there's a mess of details to sort out -
without needing to regenerate the secondary on every make run; it just
gets generated when make creates the primary (or when make finds it has
mysteriously vanished since last run).  However, source files get
randomly hacked about by users and version control systems, so would
still need their secondaries reevaluated at least whenever the source is
newer than its secondary - as discussed above.

As long as (primary) files can be modified without the meta-data you
want being updated in parallel (as a file-system time-stamp would be), I
think you are doomed to having to regenerate your meta-data more often
than you anticipate, which I suspect shall eat up all the hoped-for
benefits of saving some build steps when they're redundant.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]