Re: [gnulib-tool-py] List of modules: save in local file

2012/5/8 Bruno Haible <address@hidden>

Hi Dmitriy,

> I'm now working on section where you get all the available modules from
> gnulib-tool. What do you think about this strategy?
> 1. If the GNULibImport executes the first time, get all the modules in
> usual way (like in func_all_modules() function).
> 2. Save the list of modules using json Python module. This will help user
> to save some time in the future. We can even just save it line-by-line.
> 3. In the future get the list of the modules with json module. If it was
> saved line-by-line, just split lines of file.
> 4. If user needs to refresh list of modules, he can delete json file or
> call update method from GNULibImport class.

You mean, you want to cache the output of 'gnulib-tool --list' in some
form on the file system?

It would be an interesting idea, if it was needed.

But first some general warning about caches: In step 4 you propose a manual
detection whether the cache is up-to-date. This would be a big mistake.

Caches are there to make computer operations faster *at no additional cost*
for the user. Caches that require human intervention every now and then
are a cure that is (most often) worse than the original disease. Most
often such human intervention is required because the cache implementation
is buggy: The implementor forgot about some situations in which the cache
needs to be invalidated. But outright *never* invalidating the cache is
a no-no.

In this case, the cache needs to be considered invalid if the maximum
of the timestamp of the modules/ directory and of each if its subdirectories
is newer or at least as new as the cache file. Thus, a possible implementation
would be to store in the cache file that max-combined timestamp, and when
you read the cache file you ignore it if the timestamps of the modules/**/
directory hierarchy have changed (indicating that a file has been added
or removed or renamed).

But the question is: is it needed? I ran "time ./gnulib-tool --list" once:
0.3 sec. Once again: 0.2 sec. How often is this command run? Rarely.
I think not only 0.3 sec is acceptable, but even 3 sec would be acceptable.

Caches always have a drawback: They are not entirely invisible. When
people do "diff -r", they may lead to output. They may need to be filtered
away in some operations... Bottom line: If a cache is not needed, don't
implement it. Keep things simple if you can.

Bruno

PS: But when I'll add --local-dir variants that work across the network,
the tradeoff will be different. Network operations are rarely finished
in less than 0.3 seconds.

From:	Dmitriy Selyutin
Subject:	Re: [gnulib-tool-py] List of modules: save in local file
Date:	Tue, 8 May 2012 14:50:23 +0400