[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] arch lkml

From: Aaron Bentley
Subject: Re: [Gnu-arch-users] arch lkml
Date: 11 Dec 2003 08:37:26 -0500

[Note: this is a resend.  The last copy never made it to gnu-arch-users]
On Wed, 2003-12-10 at 12:41, Eric W. Biederman wrote:
> Miles Bader <address@hidden> writes:
> > On Mon, Dec 08, 2003 at 10:25:24AM -0700, Eric W. Biederman wrote:
> > 
> > > The one very obvious potential issue I see with arch as it currently 
> > > stands
> > > is that it does not use one of the more sophisticated storage formats
> > > for storing deltas.  Which means as your archive size increases the work 
> > > can
> > > increase.   

> As I understand the literature recent work on version control has
> used what is a variation on the gzip format for storing multiple
> versions.  The idea is you compress the first file like normal.
> But for the second and subsequent files you look back into your
> archive (which you are simply appending to) and use previous text
> for compression.  This makes both appends and random access fast
> and in addition this happens to work for random binary files.

A format like this will not have random-access properties.  For file
foo, if you store 12 revisions, you'll need to consult all 12 in order
to reconstruct revision 12.  As with Arch, as the number of revisions
increases, the work increases.  Nothing is free.

If you assume it's more likely you'll want revision 12 than revision 1,
it would be possible to work in the opposite direction:
Store r12 normally, and rearchive r11 as the deltas between r11 and
r12.  r10 would already have been stored as the deltas between r10 and
r11, so you wouldn't need to do anything else.

This would provide random access to the most recent version, and slower
access to older versions.

I believe this algorithm could also be applied to Arch, though I can't
judge whether it would be worthwile to do it that way.  

If random access to particular revisions was required, those revisions
could be stored normally.  This is starting to look like mpeg
compression, with standard storage as I-frames and deta storage as

Aaron Bentley
Director of Technology
PanoMetrics, Inc.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]