[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: Inconsistency of Added-files and Removed-files

From: Tom Lord
Subject: Re: [Gnu-arch-users] Re: Inconsistency of Added-files and Removed-files
Date: Wed, 18 Aug 2004 08:44:48 -0700 (PDT)

    > From: Aaron Bentley <address@hidden>

    >> We really ought to fix the "i want to know the inventory of that
    >> revision" and "i want to know what (file ids) changed in that
    >> revision" problem.  Those queries should be fast and that's one thing
    >> I'd add to the archive format revision wishlist.

    > Definitely on the same page there.

One thing that we've talked about is the idea of a "delta compressed
revision library" --- i.e., a "revision library" in the sense that it
contains trees extracted from archives and provides a client-side
cache;  "delta compressed" in the sense that it's optimized for access
to individual file history.

It isn't obvious to me, though, that this or any other new kind of
cache needs to be separately and specially coded.

There's only a small number of basis functions for "things we want to
be able to compute from archives".  E.g., there's
`patch_log_of(revision)' and `changeset_of(revision)' and
`changed_file_ids(revision)'.... stuff like that.   And it's not that
long a list.   Add in "second order" computations, like
`which_patches_change_file(id)' and the list *still* isn't that long.

All of those basis functions have full externalizable (i.e., printable,
readable) parameters and return values.

All of those basis functions are constant functions: given equal
arguments, they always produce equal results.

Some of those functions are expensive;  the second-order functions
depend on the first-order functions, including some expensive ones.

One idea here is to add a very general caching mechanism built around
those functions rather than around a particular data structure like a
revlib.  Schematically, instead of calling:

        log = patch_log_of (revision);
        touched_names = touched_file_names (log);

one would write:

        touched_named = query ("touched_names $revision");

if that result is cached, fine, otherwise, the cache is automatically
populated with:

        cached_result["touched_names $revision"]
         = touched_file_names (query ("patch_log_of $revision"));

and so forth.

Designed with care, a caching implementation of "query" might be able
to behave with all the important performance characteristics of the
less-generic caches we might want to add -- and with far less code!

(The idea of a general purpose cache is one of a list of reasons to 
be skeptical about too many changes or enhancements to archive
formats.   One virtue of the current format is its minimalism: there's
very little information that's redundently represented and the
information that's there is quite compressable.   So one can, for
example, read and write core arch revisions at just about the maximum
theoretical speed.   As always, speedups on top of that for particular
commands should almost always be client-side hacks and almost always
be based on memoizing or caching the results of constant functions.)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]