gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] from a file's point of view


From: Tom Lord
Subject: Re: [Gnu-arch-users] from a file's point of view
Date: Mon, 22 Sep 2003 10:39:23 -0700 (PDT)

(My incoming mail is temporarily wedged so, sorry for dropping the
threading on this reply, but:)

        Dustin:

        > Is there a way to see a list of patches that have affected a 
        > given file?

        Me:

        > [Search changesets.  Perhaps build an index to them.]


        Miles:

        > [I have a script that searches the patch log.]

        Bruce (in reply to me):

        > I think that's a bit negative.  A common case is where
        > you've got a working tree, and you're interested in what
        > patches touched a file.  And in that case, you've got the
        > patchlogs.  So a little scripting is all you need (searching
        > the patchlogs ought to be fast enough without creating a new
        > structure).


Strictly speaking, the patch logs don't (in the general case) have
enough information to trace the history of a file in a given project
tree.   In the project tree, there may be local (uncommitted) changes
that include renames that effect the file but which the patch log
doesn't know about yet.

For tracing the history of a file in a _revision_....  the patch log
does have enough information, but it's in an awefully inconvenient
format for doing a perfectly accurate search.  The format's
description of changed, added, removed, and moddified files favors
human readers, not processes.

For example, to find the revisions where a file is modified, you must
search the list of modified files in each patch log entry for
_the_correct_path_.  To know what the correct path is at each
revision, you will have to trace changes to path as you search logs
backwards.   If a containing directory is renamed, you have to adjust
the path of the file your looking for (I'm not sure whether or not
Mile's script does -- haven't looked yet).

In contrast, finding out whether a given changeset modifies a file can
be done in two steps:  look for the file's inventory tag in the
mod-files-index, then look for patches to the file in in the patches/
subdir of the changeset.

Come to think of it, both of these kinds of search are suitable for
finding changes to the file along the path of _ancestry_ -- not
necessarily the path through the ordering imposed by the namespace.

If you want to trace back a file through the revisions in a version,
and other than base-0, some of those revisions are imports or
continuation revisions, then you'll likely wind up just comparing
various versions of the file directly.

I suggested just building a little index for these queries.   A
slightly fancy way to build such an index, having the side effect of
_also_ optimizing for very fast `annotate' output, was mentioned on
the list a few weeks ago.   The idea, as you may remember, is to
modify `patch(1)' so that it can read and write files with
interpolated diffs rather than overwriting files with patched
contents.    In effect, suppose that:

        annotated(A) := a version of file A stored as interpolated
                        diffs -- effectively the `annotate' history
                        of file A.

        B            := an immediate descendent of file A


then:

        diff A B > ,tmp
        patch --annotate annotated(A) ,tmp > annotated(B)



Note, then, that it's more or less a 10-line awk script to write
the program `extract' where:


        extract --vsn A  annotated(B) > A

reconstructs the A revision of the file from the annotated history
of B.

Such changes to `patch' would be "89%" of the work needed to implement
an index which is similar to a revision library, but which stores many
different revisions in a single tree, storing the individual text
files as annotated copies of the latest revision.

That would save a decent amount of disk space compared to revision
libraries (by greatly reducing the number of inodes and directory
entries as well as reducing the amount of space lost to non-filled
blocks).  It would support a command line `tla annotate' that was
effectively instant.  Copying a revision for a `tla get' from such a
library would be a bit slower than copying it from a revision library
-- but not a huge amount slower.  Adding a new revision to such a
library would be slower -- but again, not a huge amount slower.  The
biggest loss would be that you couldn't do things like invoke `find'
to search a given revision within such a library.

Far more complicated variations are possible, for example by storing
more complicated alternatives to `annotated(B)'.  Instead of simple
interpolated diffs, one could store svn-like skip-deltas and whole
texts and so forth.  One could dynamically optimize such
representations with feedback from access patterns.  (My instinct is
that such complications would be more trouble than they're worth for
most applications in this context, but, there they are.)

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]