monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

What are rosters (was Re: [Monotone-devel] Monotone repository for Linux


From: Nathaniel Smith
Subject: What are rosters (was Re: [Monotone-devel] Monotone repository for Linux)
Date: Mon, 21 Nov 2005 23:11:15 -0800
User-agent: Mutt/1.5.9i

On Mon, Nov 21, 2005 at 03:30:23PM -0700, Shaun Jackman wrote:
> 2005/11/21, Nathaniel Smith <address@hidden>:
> > It isn't the world's highest priority, since we know we need to at
> > least get rosters in before this will be usable... hack hack.
> 
> I saw the RostersTodo [1] wiki page, but is there a brief explanation
> of what a roster is?

Right, sorry, people have asked this a few times, I really should
write an answer :-).  The real True answer is not very easy to
explain, because it's basically "err, in the places where we used to
use that data structure, we're now going to use this data structure",
and it won't really mean anything to people who haven't spent some
time tangling with the old change_set.cc.

Basically, we used to do everything interesting directly on the
change set representation, something that pretty directly matches what
you see if you look at a revision.  For some stuff, we'd convert to
other ad hoc representations (there was a "path_analysis" thing used
to convert a changeset to a more "logical files" sort of form, and a
"directory_map" thing used in places where we had to traverse
directory trees, stuff like that).  This was a big mess, to do
anything we had to go tromping all around history to figure out which
files were actually the same, the representation didn't match our
operations very well so we had to check for dozens of different
special cases, and it eventually turned out that there were some
basically unfixable bugs in merging.

But, with all the experience gained from that, plus general advances
in the free software communities understanding of version control, we
figured out how to do things differently.  Instead of having change
sets as the basic object being manipulated, we created a kind of
souped up manifest called a "roster".  (For a while, when we were just
working out how this would work, they were called "ubermanifests".)  A
roster represents a tree state, like a manifest does; but there's
extra stuff added so that it unambiguously records _all_ the
information that is needed to reconstruct a revision.  (You can't do
this with normal manifests, because if you look at two plain
manifests you can't figure out whether any renames happened, for
instance.)  Rosters's in-memory form make it efficient to do
operations like tree traversal, and trivial to detect badness (things
like trying to patch a directory, or delete a non-existent file, etc.
etc.).

So we're changing everything to work in terms of rosters, instead of
in terms of change sets.  So far this has been really nice; all the
code gets simpler and more straightforward.  We also get:
  -- first class attribute support, no more .mt-attrs
  -- first class directory support, so you can have empty directories
     and all that
  -- most operations should become faster, because we don't have to
     run around all over the revision graph collecting data; things
     like 'diff' are now independent of the distance between
     revisions.  We can still do just as much sanity checking
     (actually, I have _more_ confidence in the sanity checking now),
     but (at least in theory) make things still go fast.
  -- many many bugs fixed (we never did get directory deletion working
     right, etc.)
  -- we now have much more confidence in the correctness of our data
     model, and can rigorously reason about things like root renames
     (see other threads recently), file/directory suturing, etc.
  -- we have a much fancier merge algorithm for tree merging
     known as "*-merge" (pronounced "mark merge").  It has a rigorous
     theoretical foundation (see http://revctrl.org/MarkMerge), and
     avoids the problems that plague 3-way merge in complicated
     graphs.  We also use it as a first-pass merger for file content,
     so people should never again see the "I'm getting conflicts in a
     file that I've never touched!" thing, even in cases where 3-way
     can get confused.

The only downsides that come to mind are:
  -- we don't support file/directory suturing; at the moment it's not
     clear that there is such a thing as a content merger that works
     correctly for sutures.  So we'd have to be a bit crazy to support
     it.  This causes some problems migrating, for people with root
     dir sutures, and more problems for people who actually actively
     _use_ sutures (e.g., Peter Simons clever tricks for merging the
     multiple autoconf archives out there).  Unfortunately, it doesn't
     seem like there's much to be done; if someone works out how it is
     even possible to implement such functionality robustly, we can
     add it back in.
  -- we'll have to have another 'history rebuild', like back in 0.16
     and 0.15.  This will lose historical rename information (degrade
     it to delete+add).  Since some of these renames are incoherent,
     it's not clear we have much choice here.  We do _hope_ this will
     be the last time, though.

-- Nathaniel

-- 
"But suppose I am not willing to claim that.  For in fact pianos
are heavy, and very few persons can carry a piano all by themselves."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]