gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] obtaining delta compression with 1.x (was Re: Arch, CV


From: Thomas Lord
Subject: [Gnu-arch-users] obtaining delta compression with 1.x (was Re: Arch, CVS, Subversion)
Date: Thu, 9 Dec 2004 10:26:20 -0800 (PST)


It occurs to me that all the fuss over how to implement binary delta
compression and merging in the face of binaries is mostly a fuss over
nothing.  It is pretty easy to do already, even using only the
features available in `tla-1.2'.  (Some additional hooks would make it
even easier.)

Here is one way to do it:


    > From: "Tom Browder" <address@hidden>

    > Um, a typical project has a binary file that grows to about 10
    > Mb.  Snapshots (check ins) are done throughout the day as
    > significant editing is done to the file (mainly for backup and
    > to give a rollback capability in case of errors).  A project
    > will typically take 90+ days to complete, each day with several
    > check ins.  We have over 120 projects archived (but none of the
    > binary files yet).

    > The way we work around the lack of binary diffs now is the
    > binary file has an ASCII equivalent so we convert to ASCII and
    > check it in.  To reconstruct, check out the correct copy and
    > reconvert to binary (a pain, and subject to error).  Ideally we
    > could just deal with the binary form.


I am glad you don't mind *too* much the idea of coverting files
between formats before and after ``commit''s and ``get''s'.

My suggestion is that instead of converting to ASCII, you grab
`xdelta', and convert to an already-delta-compressed format.

To illustrate, let's suppose that you have a binary file:


        diagram.jpg

I suggest that in the archived trees, you store that file as:


        diagram.jpg.base
        diagram.jpg.xdelta


After checkout, you can construct `diagram.jpg' with:

        % apply-xdelta diagram.jpg.xdelta diagram.jpg.base > diagram.jpg


Before committing, you can run 

        % make-xdelta diagram.jpg.base diagram.jpg > diagram.jpg.xdelta

There is a catch.   If the resulting size of `diagram.jpg.xdelta' plus
the previous size of `diagram.jpg.xdelta' exceeds the size of
`diagram.jpg', then instead you should run:

        % rm diagram.jpg.base
        % cp diagram.jpg diagram.jpg.base
        % make-xdelta diagram.jpg.base diagram.jpg > diagram.jpg.xdelta

Your arch changesets will, as a result of taking those steps, 
contain delta-compressed binaries.

You might consider making `.jpg' files (or whatever your binary files
are) `precious' in arch inventories.  

If you want to get fancy, you could implement a system of Emacs-style
numbered backups:

        diagram.jpg.base
        diagram.jpg.xdelta
        diagram.jpg.xdelta.45
        diagram.jpg.xdelta.46
        diagram.jpg.xdelta.47

In that tree, the four most recent versions of the binary file are
kept conveniently on-hand.   After the next commit, the tree will 
contain:


        diagram.jpg.base
        diagram.jpg.xdelta
        diagram.jpg.xdelta.46
        diagram.jpg.xdelta.47
        diagram.jpg.xdelta.48

If you later decide that merging would be useful to you, then you can
extend the above practices by "forking" each binary file for each
branch.  You might wind up with:


        diagram.jpg.base
        diagram.jpg.xdelta.official
        diagram.jpg.xdelta.testing
        diagram.jpg.xdelta.alice
        diagram.jpg.xdelta.bob

You don't get automatic merging of `jpg' files, that way (of course)
--- but if branches make changes to the binaries, when merging them
back, at least you wind up with both alternative versions of the file
to work with and pick from.

This approach adds a new twist to working with project trees: there is
a resulting "state" that reflects which `.jpg' files have been
inflated from which bases and deltas, and so forth.   So this solution
creates the new problem of how to manage that extra state (hence my
suggestion at the end of the first paragraph about new hooks).

Tom Browder's experience, quoted above, suggests that the new problem
isn't an impractical one, even today.  (He isn't using xdelta but he
is "inflating" ephemeral source files from archived delta-compressable
files.   And he has to "deflate" the ephemeral files to get a
committable form --- the same procedures can implement binary delta
compression much more directly.)

I think it is a good trade off to solve the problem of binary delta
compression in arch in exchange for taking on the problem of managing
the inflation/deflation to/from binaries (and other kinds of ephemeral
source tree contents) from archived xdelta files (and other kinds of
archive-format-friendly-file-formats).


-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]