[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] a brief note on "delta compression"

From: James Blackwell
Subject: Re: [Gnu-arch-users] a brief note on "delta compression"
Date: Mon, 3 May 2004 16:27:58 -0400

> Some of the IRC discussion confused me a little bit and made me wonder
> if there isn't a third or more defintion of "delta compression"
> floating around that I'm not aware of.    If so, or if questions
> remain about the topic, please let me know.

You're the boss.

After Talli asked about the deltra compression (from the combining
multiple deltas standpoint), I sat back and thought about it for awhile. 

First, a definition, X is the distance between ancenstor revision M and
its succesor N (inclusive). I.E. The X between patch-45 and patch-47 is 2.

Right now, we store a "compressed delta" (CD)  of 1. This is obligatory,
as one can always be only one revision away from current, and just needs
an X of one to catch up. 

If we store a CD (in addition of to the CD with an X of 1) with an X of
two, we buy an apparent almost (but not quite!) doubling of speed. We
also double the size of our archives.

If we look at the opposite extreeme, say with a CD that has an X of 50,
it would seem that we would have a fifty fold increase of speed, but
this is *not true*! The reason is that this CD would only do good if
your local working copy was more than 50 revisions away from "current".

The lower we set X, the less point there is in bothering to do CD in the
first place. The higher we set X, the more useful the extra stored CD,
but the less likely that we'll actually be able to use it.

So we have to pick an X that is an estimate of how many revisions people
are missing at any given time. Too high, and these extra stored
revisions go unused, because nobody is that far away from the most
current revision to use the CD. Too low, and we're not gaining the
full benefit of storing extra CDs. 

My personal hunch is that the proper X is right around 5. I've got no
actual proof for that, mind you. 

Just for the record, I personally don't like the idea for the following

  1. In the case of corruption, who do you believe? Do you believe the CD
  with an X of 1, or do you believe the CD with an X of 5? 

  2. Empirically, I've seen *way* too many people dick (sorry for the
  language, but that's the appropriate word here) with their archives.

  3. This undercuts cached revisions and revision libraries

  4. Applying revisions is already pretty darn fast.

  5. Theres more important stuff out to do.

James Blackwell          Please do not send me carbon copies of mailing
Smile more!              list posts. Such mail is unsolicited. Thank you!

GnuPG (ID 06357400) AAE4 8C76 58DA 5902 761D  247A 8A55 DA73 0635 7400

reply via email to

[Prev in Thread] Current Thread [Next in Thread]