[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Cherry-Picking, Renames, Etc.

From: Nathaniel Smith
Subject: Re: [Monotone-devel] Cherry-Picking, Renames, Etc.
Date: Sun, 28 Nov 2004 14:50:42 -0800
User-agent: Mutt/1.5.6i

On Sun, Nov 28, 2004 at 11:39:18PM +0200, Oren Ben-Kiki wrote:
> We spent a fun several days laying down the design for an "ideal" 
> system. We came up with something _very_ close to Monotone, with a few 

Perhaps closer than you realize :-)

> 1. Cherry-picking:
> What you could do, instead, is forget about common ancestors, forget 
> about "deltas", and try a simple 3-way merge. Pretend your base 
> revision was B. C is obviously derived from it, so that's the revision 
> you want to merge into A. Now, *pretend A is derived from B*. If you 
> think about it, there _is_ a path leading from B to A: going 
> "backwards" all the way to D and then "forward" all the way to A.

Yes, this is exactly how 'update' on mainline can "move" your changes
to be against a different revision, and the way of doing cherrypicking
that I've been advocating. :-)

This is quite a nice description of it, though, for those who don't
always follow the sometimes telegraphic communication on this list...

> That's it. A "good" 3-way merge should result in exactly what you need. 
> Representing the result in a graph is a bit of a challange, since the 
> edge you just added doesn't come from a revision, it comes from a 
> *path* between revisions. You'd probably want to add a dotted edge from 
> B to C and have an edge from the middle of that dotted edge into 
> revision E:
>     D --> B --> F --> G --> C
>       \    \.............../
>        \           \
>         \-> A ------> E

This part is different, though.  I've been really hesitant to get into
trying to record cherrypicks in the revision graph, because it feels
like one of those problems whose "solution" suddenly makes your system
an order of magnitude more complex, and you never do hunt down all the
edge cases.  But, maybe not... can you elaborate on this point more?
How do you write down this "dotted edge", and do later operations
automatically do clever things with it?  What sort of clever things?

> Now, was it Knuth that said, "I formally proved this program works, but 
> I haven't actually tested it, so beware"? I feel the same way here. It 
> _should_ work, but I haven't tried it. I'd love to, though. All it 
> would take is adding:
>  monotone patch --from <revision> --to <revision>
> Which does a 3-way merge using "from revision" as a base, merging "to 
> revision" and this one, just like a normal merge does. In fact, the 
> implementation should be very similar to that of "monotone merge"...

'update' on mainline does this... if you have

A --> B --> C
  D --> E
          working copy

then 'monotone update $C' will perform a merge of your working copy
and C, using E as the common ancestor, writing the result into your
working copy.

> 2. File renames.
> File renames are by far the most annoying part of having to deal with a 
> version control system. If I rename a "foo" variable to "bar", I don't 
> have to tell the version control about it. But if I rename a file, I do 
> need to. Ugh.
> What we came up with was
[Arch-style file/directory ids, with some heuristics for detecting

> Any file which does contain the UID pattern is 
> automatically confirmed as a source file, and so on.

That doesn't work, actually, editor backup files (foo.c~ and the like)
will always have the UID pattern.  Arch has machinery for dealing with
all this...

> There's another, bonus advantage hidden here. Remember the 
> cherry-picking problem? To solve it, we had to do a 3-way merge. And to 
> do that, we had to know, for each file in my version, what is the name 
> of this file in the two (from and to) patch versions. Currently in 
> Monotone, this requires back-tracking from my version all the way back 
> to the common ancestor, then all the way forward to the patched 
> versions, because the SHA of the file could change in any step along 
> the way. But if each file had a UID, this work would have been saved; 
> the system would "immediately" know where each file is without having 
> to bother with the common ancestor at all. For large projects, the 
> saving could be significant.

This is pretty much the crux of the debate about "changesets" that's
been on the list recently -- whether it should be possible to apply a
changeset without backtracking through history.  It seems to have a
large component of taste -- what's your opinion of these random tags
cluttering up your source files and source directories, would you
rather muck around in your editor or with calls to the VC, etc.  I'm
not sure that performance is really relevant; Monotone always works
locally, and has a very quick-to-parse changeset format, so it really
should be possible to chew through many, many graph edges per second.
And the work involved is proportional to the amount of history to
traverse, and the number of changes on each edge, _not_ the size of
the project.

> Admittedly this isn't as "pure" as using SHA for a UID. But it is 
> _important_ - it increases usability by an order of magnitude. 
> Suddenly, the system is "telepathic". It "just knows" what you did, and 
> if it isn't 100% sure it makes very good guesses, so all you have to 
> say to it is "yes, you are right". This _will_ make a difference in 
> people's willingness to use the system.

Will it?  I'm less sure.  The Arch solution partly just moves the
issues around.  When I copy or split a file/directory in Arch, I have
to go muck around with the tags by hand, to ensure that they're all
unique, while in Monotone I don't have to do anything except add the
new files.  Arch has a concept of a "well-formed tree" -- one where
tags are unique and assigned -- which Monotone doesn't have to bother
with, much.  (We do still have the concept of "well-formed tree" as,
all the files that are supposed to exist, do, but that's kind of
minor.)  Besides which, the work involved in running "monotone mv foo
bar" is hardly an order of magnitude more than the work involved in
running "mv foo bar" :-)

Guess I'd just like to see more evidence that this distinction is as
critical as the proponents of tags make it out to be.

> 3. Merge tracking (a rather minor issue).
> Like Monotone, we viewed the history as a DAG. Work always begins with 
> some revision, so you get one edge for free. You get another edge 
> leading to your new revision every time you merge it with some other 
> revision. So far is common wisdom; I _think_ this is the way Monotone 
> does it, although the on-line documentation is silent on the issue.

Well, it's almost correct, but maybe a little confused.  Currently in
Monotone, you have:
  -- working directories, which are based on a single revision; they
     always have exactly one edge, which points to their base
  -- the 'merge' and 'propogate' commands, which take some revisions
     in the database, merge them, and then save the resulting revision
     to the database with multiple edges coming into it.
Note that ATM, merging never involves a working copy.  This will
probably change at some point, to make it possible to fiddle with a
merge in a working copy before committing it, but that's still
intended for doing merges, not for creating new work.

> There's a subtlety involved, however, which allows keeping the DAG 
> cleaner. The simplest scenario is this: Given revision A, I start 
> working on revision B. I take my sweet time completing it; in the mean 
> while, someone also start with A and commits revision C. If I check in 
> B now, the graph will look like this:
>     A --> B
>       \-> C
> However, suppose I merge with revision C _before_ I commit B. The graph 
> will look like this:
>     A --> C -\
>       \-------> B
> What we thought was that this extra edge from A to B in the above graph 
> is redundant; it should not be shown. The general rule is: when an edge 
> arrives from revision M, supress any edge arriving from an "ancestor" 
> of revision M. In the above case, A is an ancestor of C, so when B 
> merges with C, the graph becomes:
>     A --> C --> B
> Which is a cleaner and arguably more accurate description of the 
> "geneology" of revision B. Once you merge C, B really become _its_ 
> child, and not a child of the long forgotten A revision, regardless of 
> the fact that work originally started with A.

This is exactly what Monotone does now, at least in this case.  (I
can't think off-hand of any other cases that would give you that A ->
C -> B, A -> B graph, but maybe there are some.)  The "merge with
revision C _before_ I commit B" part is exactly what 'update' does.
Working copies always have exactly one base revision, and the job of
'update' is to _change_ which base revision that is.

> It still makes some sense to keep track of the actual merge operations 
> in the database, for the sake of tracking who did what when; but other 
> than that, they serve no functional purpose. We suspected that keeping 
> track of which edges were "real" would also help in some algorithms, by 
> ensuring that any "fork" in the graph is a true rather than a "fake" 
> fork.

I'm not sure I see the point of tracking updates at all.  They only
affect which revision you're making changes relative too -- tracking
them would be like tracking how many times you edited a file between
checkout and commit.

> How often this scenario creeps up depends on the work habits. If one 
> always commits before a merge (creating revision D), this never 
> happens, as the graph would always look like:
>     A --> D -\
>       \-> C --> B
> Which is a "true" fork. But if one only commits after merging all new 
> commits first (not unreasonable for a group of programmers working in 
> the same office), the graph simplification could be significant.

No substantive comment here, but I'd like to object that it might
_well_ be unreasonable, even for a group of programmers working in the
same office -- the problem with CVS's merge-before-commit isn't so
much that it doesn't work when distributed (though that is a problem),
but that you spend a few weeks working on some code, you test it, it
works great... and then you throw out that code forever and commit
something else that may or may not work as well.  It's basically
impossible to have a test-before-commit policy with CVS; Monotone is
designed to fix this.

> Well, that's it... I hope you find the above useful, or at least 
> coherent :-) Again, I think Monotone is on the right track, and had I 
> the time I would gladly try hacking directly at the code instead of 
> writing rambling posts. But I have to get back to my own project (YAML, 
> if anyone is interested)...

You've obviously thought about these issues a lot, be glad to hear any
further comments...

-- Nathaniel

So let us espouse a less contested notion of truth and falsehood, even
if it is philosophically debatable (if we listen to philosophers, we
must debate everything, and there would be no end to the discussion).
  -- Serendipities, Umberto Eco

reply via email to

[Prev in Thread] Current Thread [Next in Thread]