[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: semantics of restricted mkpatch

From: Tom Lord
Subject: [Gnu-arch-users] Re: semantics of restricted mkpatch
Date: Tue, 7 Oct 2003 13:04:34 -0700 (PDT)

    > From: Denys Duchier <address@hidden>

Very short answer:

  Ok, I understand where you're going much better now.   It needs a few
  tweaks (has an integrity bug, breaks the current selected files
  commit functionality, and I still don't think LIMITS should (in a
  simple way) match paths in ORIG:

Short answer:

    > Thanks a lot for this thourough analysis.  I think we should proceed
    > by cataloguing the types of elements in a changeset and formalize
    > their dependencies.  

  All trees are characterized by a bijective mapping between relative
  paths and inventory ids.  (In practice, two mappings are used: one
  for dirs, one for everything else.)  Adds, deletes, and renames can
  be figured out by comparing two of those mappings (for ORIG and

  The -index files in changesets contain a subset of the complete
  indexes including: entries for all modified files, entries for all
  adds and deletes, and (most of :-) the transitive closure of
  containing directories for those.

  I think it will be helpful to think of computing the "maximal
  partial changeset" basically in terms of some algorithms on those

    > Hopefully, in this fashion (e.g. using a closure
    > algorithm) we will be able to compute provably minimal
    > integrity-preserving changesets.

  You're clearly in the right space.  Your current algorithm is
  slightly not integrity-preserving -- roughly, I think you need a
  closure-computing loop where currently you make two passes (see the
  long answer, below).  I want to be clear that:

        a) The minimal/maximal thing is clearly a long-term parameter
           (we might eventually want both functionalities, in one form
           or another).  There's nothing wrong with doing just one or
           the other first (and maximal is looking mostly good).

        b) However, even if it means having two paths through the code
           and a CLI option, the special case minimal changeset
           created by the current selected-files commit should remain
           available even if you're adding a non-minimal limits
           mechanism.  (In practice, this could mean as little as 
           not getting rid of `make-changeset-files.[ch]')

        c) I think that minimal is likely to be more commonly
           desirable _when_it_makes_a_difference_.   I also guess 
           that in the vast majority of actual uses, they'll both give
           the same answer.   The examples I can give where maximal
           is not helpful are not examples I expect to be common cases.

           So, that's my vote for "minimal" -- but it's not a really
           strong vote.   The current bug aside, one thing your code
           showed me is that the "maximal" changeset is likely far
           easier to compute (and explain) than the minimal one -- and
           that weighs more heavily than my vote for minimal given
           that it will hardly ever make a difference.

        d) I'm less comfortable with using the limit paths to match
           against both the ORIG and MOD tree.   This isn't an
           integrity-preservation issue -- it's a usability issue.
           (See the long answer.)   This complicates your algorithm 
           but I think you're right that if you view it as two
           predicates, one for each tree, that's the right frame of
           mind for sorting it out.

   Bottom line: I'm not against this idea at all -- the opposite, in
   fact.  It just needs some tweaks.  The idea of a maximal partial
   changeset is one I hadn't thought of as useful and it feels to me
   like a breakthrough idea to regard it as useful, especially in
   combination with the availability of a minimal selected-files

   I had previously rejected maximal partial changesets because of
   really weird edge cases that will hardly ever come up.   Yet it's
   easy to explain and understand and in almost every realistic
   situation -- it will give same answer as a minimal partial


Long answer:

    > My idea of LIMITS is that they select what you are interested to put
    > into a changeset.  It is then the job of mkpatch to pull-in whatever
    > else is necessary in order to produce an integrity-preserving
    > changeset.

I see that.  Or as you say later, you're not producing a minimal

    > >   In general that seems like a bad 
    > >   idea.  Consider this variation:

    > >         ORIG TREE:                      tag     modified
    > >
    > >                 ./dir-A                 T1      
    > >                   ./dir-A/file-A        T2
    > >                   ./dir-A/file-b        T3
    > >                 ./dir-B                 T4
    > >                   ./dir-B/file-c        T5

    > >         MOD TREE:

    > >                 ./dir-B                 T1
    > >                   ./dir-B/file-A        T2      yes
    > >                   ./dir-B/file-b        T3      yes
    > >                 ./dir-C                 T6
    > >                   ./dir-C/file-c        T5

    > >    If your algorithm produces a changeset that records
    > >    the rename of directory T1, then it must also record
    > >    the addition of T6 and rename of T5 -- this seems like
    > >    a very strange result for "LIMITS = { dir-A/file-A }".

    > No, that is incorrect.  Here is my interpretation:

    >    - the limit matches

    >   dir-A/file-A    (causes dir-A to be "needed")

    >    - therefore we include in the changeset

    >   rename dir-A -> dir-B (pulled in because dir-A is "needed")
    >         modif dir-A/file-A -> dir-B/file-A

    >    anything else is neither selected not "needed"

If your changeset records "T1: dir-A -> dir-B" but not "T4:dir-b
deleted" then it does not apply cleanly to the ORIG tree (because the
path ./dir-B is already in use).  If it _does_ include "T4 deleted",
then it should probably also include "T6:./dir-C created" and "T5:
file-c renamed" (so that file-c is not quietly deleted when the
changeset is applied to ORIG).

In other words, with your current answer:

        delta(ORIG, MOD, LIMITS)[ORIG] => conflicts

which means that if I were to `commit' using the changeset delta(ORIG,
MOD, LIMITS) then I could not `get' the resulting revision.

Either your changeset has to be larger (which leads to the interesting
situation that a limit of a single file can imply an arbitrarily large
change to the structure of the surrounding tree) or it should be
minimal (so that only changes within that file are recorded).

As I've said, although the odd situation of a much larger changeset is
weird:  I think it will hardly ever come up in practice and it will be
easy to understand when it does.   Normally, you'll get a minimal or
near-minimal answer with either algorithm.   If "selected files
commit" is separately available, a maximal changeset is fine.

Getting the _right_ larger changeset is, I think, "only" a case of
turning your closure computation into a loop.

    > > * tricky to express limits 

    > >    Consider trying to limit mkpatch to a file deletion:

    > >         ORIG TREE:                      tag     modified

    > >         ./dir-A                         T1      
    > >           ./dir-A/file-A                T2

    > >         MOD TREE:

    > >         ./dir-B                         T1      

    > >    with "LIMIT == { dir-B/file-A }"

    > >    Conceptually, clearly, this should be the same as:

    > >         LOGICAL-MOD TREE:

    > >         ./dir-A                         T1

    > >    but in practice that's quite tricky.   How is mkpatch 
    > >    supposed to figure out that the file named in the limits 
    > >    ("dir-B/file-A"), which doesn't exist in the MOD tree, 
    > >    is the same file that had tag T2 in the ORIG tree?

    > In my interpretation, LIMITS apply to, and thus select from, both ORIG
    > and MOD.  

    > You could use a limit dir-A/file-A if you want to include
    > file-A into the changeset eventhough it no longer exists in MOD.

I was thinking something more like:

        1) Notice that the limit, dir-B/file-A doesn't exist in MOD.
           Evidently, the user intended to commit a delete.

        2) Notice that dir-B in MOD is the same as dir-A in ORIG, and 
           dir-A/file-A _does_ exist in ORIG and _has_ been deleted
           from MOD.

        3) Conclude that the delete of T2 is what is intended by the

(If the limit were dir-B instead of dir-B/file-A, the delete should be
noticed then, as well.)

I think that your proposed semantics (matching limits in both ORIG and
MOD) are highly undesirable in some not-too-rare edge cases.  Consider
for example:

        ORIG TREE:
                ./=new-libfoo/                  T1
                ./libfoo/                       T2

        MOD TREE:
                ./=old-libfoo/                  T2
                ./libfoo/                       T1

        LIMITS = { ./libfoo }

I would hope to get a changeset that contains changes I made within
what used to be called "./=new-libfoo".   If somebody else who hasn't
yet swapped the two libfoo implementations applies my changes, those
changes should be applied to their "./=new-libfoo".

With your semantics, there is no way for me to do that.  My changeset
will always include the rename.  That's the minimal/maximal issue
which I can probably live with if that's all there were to it, but
it's even worse than that:

Suppose I have _also_ made changes within "T2:./=old-libfoo".   I
don't want to commit these with my changes to T1 but, since the limit
"./libfoo" selects T2 (from the ORIG tree) I have no choice.

    > > 1) It appears that you are pruning both the ORIG and the MOD trees
    > >    according to how the limits compare to the path names to files in
    > >    either tree.   I think that the limits specs should refer to paths 
    > >    in the mod tree only (with a partial exception for limits naming 
    > >    deleted files).

    > I know that was your preferred interpretation, but I disagree with
    > it.  

Hopefully the above examples make clearer why it is undesirable.

    > We could certainly make the
    > language more expressive, but this is a completely orthogonal issues
    > to that of proving that the algorithm is integrity-preserving.  

Agreed.  (The current algorithm you are describing seems, as I've
said, to not be integrity-preserving.)

    > > 2) I don't see any code that handles the case of "impossible limits"
    > >    described above.

    > as explained, it is not at all impossible, but quite straightforward,
    > unless I missed the point which is entirely too likely.

The "impossible limits" problems don't show up with maximal partial
changesets -- pretty much by definition.    Essentially, I was just
worrying that you weren't computing a minimal changeset.   So, forget
I said that :-)

    > > What I _fear_ about this code (simply because I haven't seen any math
    > > that would prove otherwise) is that it is capable of producing a
    > > changeset which would not apply cleanly to ORIG (which would be
    > > disasterous).

    > Yes, I understand this worry.  With your concern about minimality, the
    > problem seems to me harder than the one I tried to solve (but perhaps
    > this is only superficial).  If you forget about minimality, then there
    > is not much of a problem: you take a full changeset (therefore one
    > which is guaranteed to apply cleanly) and, (thanks in part to my
    > notion of a NEEDED directory) only throw out parts affecting entire
    > subtrees.  This needs to be formalized.  I'll work on it.

Ok.   The thing to watch out for when throwing away parts of a
changeset is that you might be throwing out something that needs to be
there to a avoid a conflict when applying the parts you don't throw


reply via email to

[Prev in Thread] Current Thread [Next in Thread]