monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon


From: Justin Patrin
Subject: Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon
Date: Fri, 25 May 2007 14:02:45 -0700

On 5/25/07, Markus Schiltknecht <address@hidden> wrote:
...continuing my thoughts on partial pull.

I came to think about that single horizon. Why do we limit to only one
horizon? That poses the problem, that on subsequent partial-pulls which
move the horizon, monotone would have to throw away revision data it has
already fetched. That seems wasteful. Can't we have multiple horizons,
i.e. gaps of missing revisions?

Let's see: if we want gaps, we have to keep track of the beginning and
ending of the gap. A sentinel, as in partial pull, already covers the
beginning. Let's just add the ending information to it. To do the same
as partial pull currently does, we'd just add the magic rev id [] as the
ending.

If a client does multiple partial-pulls and finally decides to fetch the
whole repository, that would save him from transferring some revisions.
But more importantly, I can see use for gaps to solve other problems.
Namely legal problems, where you want to stop serving certain revisions,
but still want to have an intact repository. Another possibility gaps
would open are 'hot' servers, which serve only the last few months worth
of revisions, but serve those very quickly as they aren't burdened with
years of history (i.e. resulting in much smaller merkle tries).

Only for the server. If the client has a full database then its merkle
tree is much larger and the refinement is always going to be more
costly than if the server contained the entire history, at least if I
understand the idea of merkle refinement correctly. Finding a single
missing new node vs. a mess of missing history nodes that the server
doesn't want seems to me to be a largish difference.

Basically I'm trying to say that the refinement will likely take a hit
of some kind here (most bandwidth?) and that having a smaller merkle
tree on the server is not likely to cause any real performance gains.


I've had a look at implementing those gaps. I think we could simply
build rosters for complete gaps and process normally from there on. I
would even add the necessary (parent, child) tuples in revision_ancestry
for the gaps. So you could still build an ancestry tree. Just one which
has revisions in it, which stand for a whole gap, i.e. multiple
(missing) revisions. For mtn log or mtn annotate, you wouldn't have to
care about bailing out as soon as you hit the horizon, but you'd just
get a special revision_t, which stands for a whole set of missing
revisions. If you ask me, the issue gets a tad simpler, as soon as you
start to think of gaps instead of a (single) horizon.

Of course, gaps also don't come for free. The hardest problem probably
is, that a gap can perfectly well have more than two ending revisions,
i.e. ancestors. Our current roster and merge code doesn't support that.
(Hm, I remember having raised that issue before...) We could either add
support for more than two ancestors or work around the issue by letting
monotone insert multiple cascaded sentinels, sort of a binary tree. I
still prefer the first option, but the second is probably easier to do
and doesn't pose problems with the mark merge algorithm.

While talking about merging: in the presence of gaps, merging is not
always possible. Of course, you'd have to have the two revisions you
want to merge - we need their rosters. Additionally, we need the common
ancestor. As long as that does not lie within a gap, we can do the
merge. Even if there are gaps in between, since we have a valid roster
for the gap. (In case of a partial pull, where the gap ends at the magic
revision id [], that could probably be the common ancestor. Of course we
should prevent a merge in that case, too.)

This seems like a bit of faulty logic to me. You *could* use
mark-merge to merge across a gap, but do we want to allow it? For
example, take a case where the revisions to merge are both below the
lowest horizon but the common ancestor is *within* a gap above the
horizon above which we have more revisions and then another horizon.
Hmmm...now that I'm thinking about this I suppose if this was "done
right" then the gap-revision would come up as the common ancestor and
the merge algorithm stop and forces a pull before merging.

Here's another case, however. What if the common ancestor is above the
gap? How will the common ancestor algorithm be able to tell this if
there is a gap in between? To be able to figure this out the gap would
have to be collapsed in a manner that allowed the marks to be found. I
don't think that's going to be possible...


We could easily implement partial pull using gaps, but to gain the other
features, we would also have to adjust netsync to be able to serve
partial repositories. AFAIK, partial pull currently doesn't plan on
supporting that.

What gaps clearly wouldn't solve is, wanting to remove only parts of a
revision - which admittedly is much more common than wanting to remove a
whole revision.


What do you think? Does that make sense? Is it worth implementing gaps?
Do you foresee other problems?


To be continued...


--
Justin Patrin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]