monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon


From: Markus Schiltknecht
Subject: Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon
Date: Thu, 31 May 2007 09:29:52 +0200
User-agent: Icedove 1.5.0.10 (X11/20070329)

Hi,

Nathaniel Smith wrote:
...I don't think we're communicating, because I have no idea what
you're talking about :-).  Obviously I am not being clear either, so
let me lay out my understanding again...

Good, I'll try to be clearer, too.

In my world, the reason we need partial pull is that the total history
size of a project grows without bound.  Therefore, for very large and
old projects (Linux kernel, *BSD, Mozilla, gcc, glibc, maybe a few
others), the full history database may be many times larger than a
checkout.  It is unreasonable to expect new developers to, before
writing their first patch, download several gigabytes of data.
However, even for such projects, the actual rate of new history being
added is not *too* high, the problem comes from the long history.
And, if I am following such a project, it is not unreasonable for me
to each week download whatever happened that week.  So incremental
updates are bounded and not a problem, just the initial pull size
grows without bound.

Agreed, that's the main reason for partial pull.

So my imagined use case is that a new developer says
  mtn clone netsync://mtn.project.org --restrict-last 1000
which fetches only revisions up to depth 1000 from all heads, and sets
the horizon to be whatever revisions have depth exactly 1000.

Yup.

Later, they want to pull as normal, so they just do
  mtn pull
and this fetches all new revisions since their initial pull.  The
horizon does not move.

That's probably the most frequent use case, yes.

They may wish to at some point do
  mtn pull --restrict-last 2000
to fetch more history.  This asks the server what the new horizon
should be, moves the horizon there, and fetches intermediate stuff.

I'm sure you are well aware, that this doesn't necessarily mean fetching 'more history', but could also mean fetch the newer revs and drop the old ones. I.e. it's unclear, if the horizon moves up or down.

(It also effectively forces a full regenerate_rosters.)

I'd really like to get rid of that step, because it makes moving the
horizon (or filling gaps) terribly expensive. Can't we simply store
real, local node ids for the sentinels, instead of adding another set of incompatible ids? That way, we would have to pay attention to keep the node ids in sync with the ones used for the sentinels, when filling the gap, but AFAICS this should be possible. Am I missing something?

I'm thinking of the result of a partial pull as a repository having a gap between the root and the horizon. In that sense, such a repository it's not a single contiguous history window, because it also has a root. You would have to _replace_ the root with the sentinel revision ids to get a repository to be a contiguous history window.

By root do you mean the first commit (which in a partial pull we don't
have), or the magic root revision [] (which doesn't actually exist)?
How is this discontiguous?

I meant the magic root revision []. It is discontiguous in the sense
that you probably can't get rid of this magic thing (which doesn't
really exists, so you can't get rid of it per definition)  So... okay,
if you replace the magic root revision with the horizon, you no longer
need the magic root revision - and are contiguous. (But can you replace
something that does not exist? This is going towards a philosophic
debate... ;-) )

(By contiguous I mean in particular the property that if we have
revision A and revision B, and A is an ancestor of B, then we have all
revisions that are _between_ A and B.  Put another way, the contents
of a database should always be a convex set.  Convex sets turn out to
be totally an awesome concept -- see the new uncommon ancestors code
for another example...)

Gaps do their best to maintain that property for the revision ancestry. But your point is taken, that gaps do not quite count as 'yes, we have that revision'. Only very few commands really care about that, though.

In my version, sentinels basically become roots.

Understood.

Only if they _are_ ordinary revisions.  Unfortunately, history
representations are not a place where picking a representation at
random and hoping turns out to work very often :-(.

Sure, but every command needs to check for sentinels anyway. And we need
to decide how to treat them. No matter if we do a restricted horizon
implementation or gaps.

Arbitrary-number-of-parent revisions are one thing; they at least make
sense.  How are you planning to create synthetic revisions for
arbitrary numbers of revisions on the "bottom" end of the gap?  Note
that to express the right lifecycle and mark semantics between these
revisions, you may need to postulate arbitrarily many extra
intermediate synthetic revisions in the middle... I think someone gave
an example downthread?

You just need a sentinel for every revision on the bottom end of the gap
- as with a horizon. Of course those sentinels need to have a manifest
with node ids (for added files only) to maintain the right lifecycle and mark semantics.

Please note, that the only thing I'm proposing to add to the existing
partial pull concept with it's sentinels is, that sentinels should have
ancestors. Where you are assuming that the sentinels ancestor always is
the magic root revision [], seen from the gaps point of view.

Yeah -- I agree things like log, annotate, etc. should have some
special handling for when they fall off the end of known history.  I
was just hoping that special handling could be added incrementally,
and would mostly involve printing an extra message or something, not
needing to alter the actual algorithms.

I'm trying to convince you, that gaps *simplify* that, as sentinels are
much more like actual revisions if they have ancestors. So that the
actual algorithms *don't* have to change, but only the printing logic
gets an 'if (rev.is_sentinel)' added.

(And as I've pointed out, even merges are possible across gaps as long
as the common ancestor is not within a gap.)

Generality is only good if it is for a purpose.  I'd still like to see
some use case for why I would want to have history from 1990-1992,
2000-2001, and 2005-present together in a database.

Hm... that's why I've been asking. I got the 'gaps' idea from an implementers point of view, discovering that it might actually be easier to implement - at least in those parts of the code I've touched so far.

Regards

Markus





reply via email to

[Prev in Thread] Current Thread [Next in Thread]