gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: [BUG] FEATURE PLANS: "perfect" summary deltas


From: Tom Lord
Subject: Re: [Gnu-arch-users] Re: [BUG] FEATURE PLANS: "perfect" summary deltas
Date: Fri, 4 Jun 2004 11:01:23 -0700 (PDT)


    > From: address@hidden

    > I've wondered what all that stuff ment, so I coded up a little python
    > script: [....]

You can understand summary delta branches in terms of binary trees.

Let's say that we have a regular version made with ordinary commits.
Each revision is changes relative to the previous:

        0 -- 1 -- 2 -- 3 -- 4 ......


We can imagine those revisions as occuping a corner of an infinite
binary tree:


                               ....
                            /
                           7
                      /

                    /

                 3                              .....
              /     \
           1          4
          / \        /  \
        0    2      5    6



Notice how there's a left "spine" to the tree of revisions
numbered 2^K - 1 for some non-negative K (0, 1, 3, 7, ...).

The first rule is that every revision on the spine, except for
revision 0, is stored as a changeset relative to 0.

The second rule is that every revision not on the spine is stored
as a changeset to its closest ancestor that is on the spine.
So, using arrows to indicate the changesets found in a summary delta
version:


            ---3
           '    ^
          ' 1   |`-4
         '  '^  |
        '  ' |  | ----
       V <-  |   `    `
       0     2    5    6

Everything points to the closest left-spine element except for
left-spine elements which all point to 0:


   summary rev #        is stored as a tag of
                        and changeset relative
                        to rev #


        0               --
        1               0
        2               1
        3               0
        4               3
        5               3
        6               3
        7               0
        8               7
        9               7
        10              7
        ...
        15              0
        ...


That graph has the property that no two revisions are more than four
changesets apart.   For example, to get from  5 to 14, instead of
applying 9 changesets (6, 7, 8, ...), you could apply
summary delta changeset and build:


        5 --> 3         (reverse apply the changeset of summary-delta
                         5 to get the spine revision that 5 is a tag
                         of)


        3 --> 0         (reverse apply summary-delta 3)

        0 --> 7         (forward apply summary-delta 7)

        7 --> 14        (forward apply summary-delta 14)


Basically, you can always use the left spine of the tree and the 0
revision (of this summary phase) as a pivot point.


    > Another point I'm not sure about: How much space will the +summary
    > version take up? It seems as if there was _lots_ of redundancy in your
    > proposed solution.


AFAICT, space consumption is a tunable parameter for which we need a
good default.   Remember the bit about phases.  An actual
summary branch looks like this:


      / \   / \   / \
     /   \ /   \ /   \
    A ... B ... C ... D

where each triangle is one summary-delta tree.  You can get from any
revision between, say, B and C to any other revision between B and C
in at most four summary changeset applications, but you can't
(directly, anyway) use summary deltas to get from between A and B to
between C and D.

You (or tla) can pick where to place the corners of the summary trees;
where to place A, B, C, and D.   The location of those, along with the
specific nature of the changes in the revisions being summarized,
determine the space consumption of the summary delta branch.

For example, if every had exactly 3 revisions then the summary delta
version would double the space requirements (because it would merely
duplicate the changesets in the summarized version).    

On the other hand, if "B" is placed at infinity, then the worst case
space consumption for K summary revisions is K*CCH_SIZE wher CCH_SIZE
is the average size of a cached revision (in other words, with B at
infinity, assuming the source is changing semi-randomly,
summary-deltas take as great a magnitude of space as if you had
cachereved every revision).

Interestingly, if B is 1, there's no point to using summary deltas
because they're too small.   If placed at infinity, there's no point
to using them because they are less efficient than cacherevs.

The goal is to find a sweet-spot in the middle for placing "B".  I was
thinking of placing "B" (cutting off a summary tree) whenever a new
changeset about to be added to the summary tree would exceed 1/4 the
size of a cachedrev.  That way, no path within a summary tree will be
larger than a cacherev.   

But I think that 1/4 has to be tunable.  You might want it larger for
a project with a small source tree (like tla).

Hmm... come to think of it, let's suppoe that you have a summary
branch:


      / \   / \   / \
     /   \ /   \ /   \
    A ... B ... C ... D


there's no good reason to not make tla robust against missing summary
revisions.  That is, you should be able to simply delete all of the
files in revision dirs of some old summary branch revisions, leaving:


                  / \
      (empty)    /   \
    A ... B ... C ... D


which would allow you to maintain an (essentially) space-bound summary
branch.


-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]