monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon


From: Nathaniel Smith
Subject: Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon
Date: Thu, 31 May 2007 01:50:07 -0700
User-agent: Mutt/1.5.13 (2006-08-11)

On Thu, May 31, 2007 at 09:29:52AM +0200, Markus Schiltknecht wrote:
> Nathaniel Smith wrote:
> >They may wish to at some point do
> >  mtn pull --restrict-last 2000
> >to fetch more history.  This asks the server what the new horizon
> >should be, moves the horizon there, and fetches intermediate stuff.
> 
> I'm sure you are well aware, that this doesn't necessarily mean fetching 
> 'more history', but could also mean fetch the newer revs and drop the 
> old ones. I.e. it's unclear, if the horizon moves up or down.

Eh, I just figured that the point of pull is to add stuff to the db;
if we already have everything within the last 2000 revisions, then we
can just do nothing :-).  (And who cares if we happen to also have
some other stuff, e.g. if I do 'mtn pull net.venge.monotone*' I don't
expect any non-monotone branches to also be discarded...)

Again, I just don't know many use cases where throwing stuff away is
important.  Downloading a bunch of stuff in a lump is a major hurdle
to adoption; needing to free up some-but-not-all disk space is a much,
much rarer operation, especially in these decadent multi-gigabyte
days.

> >(It also effectively forces a full regenerate_rosters.)
> 
> I'd really like to get rid of that step, because it makes moving the
> horizon (or filling gaps) terribly expensive. Can't we simply store
> real, local node ids for the sentinels, instead of adding another set of 
> incompatible ids? That way, we would have to pay attention to keep the 
> node ids in sync with the ones used for the sentinels, when filling the 
> gap, but AFAICS this should be possible. Am I missing something?

Yes: marks.  We can either send marks over the wire, or we can have
partial pull use fake marks (using the clever tricks worked out at the
summit) and regenerate marks (stored in rosters) when new history
comes in and invalidates the fake marks.

> >>I'm thinking of the result of a partial pull as a repository having a 
> >>gap between the root and the horizon. In that sense, such a repository 
> >>it's not a single contiguous history window, because it also has a root. 
> >>You would have to _replace_ the root with the sentinel revision ids to 
> >>get a repository to be a contiguous history window.
> >
> >By root do you mean the first commit (which in a partial pull we don't
> >have), or the magic root revision [] (which doesn't actually exist)?
> >How is this discontiguous?
> 
> I meant the magic root revision []. It is discontiguous in the sense
> that you probably can't get rid of this magic thing (which doesn't
> really exists, so you can't get rid of it per definition)  So... okay,
> if you replace the magic root revision with the horizon, you no longer
> need the magic root revision - and are contiguous. (But can you replace
> something that does not exist? This is going towards a philosophic
> debate... ;-) )

Eh, probably better to think of [] as not an actual revision (there's
nothing stored for it in the database, there's no defined textual
form, various operations will I() if you try to treat it as a real
revision id), but rather just a special marker string that means "no
revision here".

> >Arbitrary-number-of-parent revisions are one thing; they at least make
> >sense.  How are you planning to create synthetic revisions for
> >arbitrary numbers of revisions on the "bottom" end of the gap?  Note
> >that to express the right lifecycle and mark semantics between these
> >revisions, you may need to postulate arbitrarily many extra
> >intermediate synthetic revisions in the middle... I think someone gave
> >an example downthread?
> 
> You just need a sentinel for every revision on the bottom end of the gap
> - as with a horizon. Of course those sentinels need to have a manifest
> with node ids (for added files only) to maintain the right lifecycle and 
> mark semantics.
> 
> Please note, that the only thing I'm proposing to add to the existing
> partial pull concept with it's sentinels is, that sentinels should have
> ancestors. Where you are assuming that the sentinels ancestor always is
> the magic root revision [], seen from the gaps point of view.

I am not sure I am fully understanding what you propose -- for
instance, I get the impression that you want sentinels to have
associated revision_t structures, whereas in the existing partial pull
concept, sentinels are just magic revision_id's, similar to the magic
revision_id [].  (The difference being that [] is inherently magic,
while sentinels' magic-ness would be stored in the database -- but it
would still generally be a constant "fact about the world" for most
monotone commands.)

> >Yeah -- I agree things like log, annotate, etc. should have some
> >special handling for when they fall off the end of known history.  I
> >was just hoping that special handling could be added incrementally,
> >and would mostly involve printing an extra message or something, not
> >needing to alter the actual algorithms.
> 
> I'm trying to convince you, that gaps *simplify* that, as sentinels are
> much more like actual revisions if they have ancestors. So that the
> actual algorithms *don't* have to change, but only the printing logic
> gets an 'if (rev.is_sentinel)' added.

If sentinels are *exactly* like actual revisions, then of course that
simplifies things, because our existing code Just Works.  They aren't,
though; they have no associated roster, for instance, so how do
algorithms that walk the ancestry graph (which I guess has some fake
entries for sentinels added?) and request rosters for each entry work?

If you have two different things, forcing them to be as similar as
possible generally makes code more complicated, not less -- because
even if you can use the exact same code in 99% of cases, in every
single case you have to stare at it first to make sure that *this*
isn't that 1% case.  And when you forget you introduce a bug, and
reviewers will tend not to notice it for that same reason, and
debugging is harder for that same reason (esp. since when debugging
you have to stare extra hard at all the correct code too, because
you've made it harder to verify that it's correct and rule it out as
the source of the bug).

> >Generality is only good if it is for a purpose.  I'd still like to see
> >some use case for why I would want to have history from 1990-1992,
> >2000-2001, and 2005-present together in a database.
> 
> Hm... that's why I've been asking. I got the 'gaps' idea from an 
> implementers point of view, discovering that it might actually be easier 
> to implement - at least in those parts of the code I've touched so far.

...But I'm speaking in generalities, above.  It describes my
experience with monotone code in the past, but I haven't taken the
time to really look at your code, or try implementing this myself, and
actual code always wins over guesses (no matter how informed).  So
please take this reply just as food for thought...

-- Nathaniel

-- 
So let us espouse a less contested notion of truth and falsehood, even
if it is philosophically debatable (if we listen to philosophers, we
must debate everything, and there would be no end to the discussion).
  -- Serendipities, Umberto Eco




reply via email to

[Prev in Thread] Current Thread [Next in Thread]