Re: [Monotone-devel] [PATCH] cvs_import connecting branches

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] [PATCH] cvs_import connecting branches

From:	Markus Schiltknecht
Subject:	Re: [Monotone-devel] [PATCH] cvs_import connecting branches
Date:	Mon, 20 Feb 2006 09:27:42 +0100

Hello Nathaniel,

thank you for your response.

Over the weekend I have tried to import a bigger repository with
disappointing results: exactly _no_ branch has been connected. So I need
to rethink and/or improve the algorithm... the log file is 144mb in
size, so finding out what went wrong is not trivial.

On Sat, 2006-02-18 at 16:38 -0800, Nathaniel Smith wrote:
> On Fri, Feb 17, 2006 at 04:30:46PM +0100, Markus Schiltknecht wrote:
> > after three days of hacking on rcs_import.cc I came up with a very
> > simple solution to the cvs_import branch connecting problem.
> 
> Great!  Now if only I was confident in my own understanding of CVS
> importing... but that's what's been scaring everyone off from trying
> this, so I guess I'd better try myself :-).

Hum? What's that supposed to mean?

> > My patch makes the cluster_consumer check every revision against the
> > branch starting points. Only if a revision has all the same files with
> > exactly the same versions as the branch starting point then this
> > revision is considered to be the branchpoint for that branch.
> 
> When you say "exactly the same versions", do you mean exactly the same
> content (like, SHA1), or do you mean exactly the same RCS revision
> number?

In this stage, that does not really matter. Thought I guess it's more
like SHA1 comparing.
(line ~ 1418: if (live_files == child_branch->live_at_beginning) )
both are map<unsigned int, unsigned int> where the integer values point
into the interners "cvs_path" and "cvs_version". AFAIK cvs_version
stores the SHA1 value.

> The latter seems the right thing to do, because it is possible (even
> likely) that the same tree contents will appear at multiple places in
> history.  (E.g., whenever a patch is backed out.)

Hm... right. It would be better to compare by RCS revision number in
such a case. Too bad!

> > If a branch can not be connected to any revision we fall back to
> > importing the branch unconnected as before.
> 
> I suspect there are places where we can do somewhat better even than
> simply leaving things unconnected; there are horrid things like

What would that be? Of course we can improve the
'branchpoint-find-algorithm', but if we finally fail to find a
branchpoint we simply have to fall back to unconnected import, don't we?

> > Empty branches are not
> > imported anymore (since I consider empty branches useless, to tag a
> > release use tags ;-).
> 
> I might be missing something, but this doesn't seem like quite the
> right behavior.  I don't consider empty branches particularly
> interesting either, myself, but maybe we have users who do... they
> went to the trouble of actually creating them in their CVS history,
> after all :-).  Usually the best/safest thing for cvs_import to do is
> to try and translate over the CVS history, as exactly as possible.

Of course you are right. My reasoning was that it might be easier to
branch at the right revision again in monotone - if you absolutely want
to have empty branches...

> > To be able to correctly branch from revisions, the mainline needs to be
> > imported first. In consume_cluster a revision is checked against all
> > branches and the branch is possibly marked to start from that revision.
> > Then other branches, which already got marked can be imported. Finally
> > branches which did not get a mark get imported without any connection to
> > previous releases (fall-back).
> 
> How does this deal with branches-off-branches-off-branches?  Don't you
> have to sort all of them somehow?

Branches which have a revision assigned get imported before the others,
giving them the change to get a revision assigned from a branch. Only in
case none of the branches have revisions assigned, cvs_import starts to
import unconnected branches.

(Which at that point could again assign a revision to other branches...
maybe that's not so wise?)

> Speed is always nice, but as long as it runs in some vaguely
> reasonable time, CPU is not that important for a conversion tool; if
> it takes less than a week to import a large repo, then the tool is
> usable :-).

Sure. Anyway, be also considering branching time we would not only
reduce computing time but also improve the probability to hit the right
revision ;-)
(Think of reverting patches...)

> > - if you have multiple branches with the very same revision and which
> > are branchpoints for further branches, cvs_import might choose the wrong
> > branch. I suppose will never happen in reality, but...
> 
> When dealing with CVS, _every_ bad thing has happened in reality
> somewhere :-(.

Okay, then let's compare RCS numbers. I'm currently thinking about
including each file's 'branch-event' into the cluster consumer and
handle it much like a commit. Since a commit should be as atomical as
branching anyway.
(If you have overlapping branching and commits, we don't have much of a
chance.)

> On the other hand, though, we never guarantee anything better than
> "best effort" for the bizarro self-contradictory stuff that CVS can
> spit out, and certainly doing _some_ sort of branch reconstruction is
> a large improvement over what we're doing now.

Yeah, that's also why I tried to be conservative: not connecting a
branch is better to connect a branch to a wrong revision, IMHO.

> Do you know the cvs2svn branch reconstruction algorithm?  I don't
> really know how they choose branch points myself, but you should
> probably check it out if you're going to try and make this work.
> Importing from CVS is a world of tricky corner cases, so learning from
> people who have already tripped over a lot of them is very useful.
> cvs2svn is definitely the tool with the best algorithms to look at.

I know cvs2svn and have studied it a bit. But yes, it can not hurt to
look at their 'branchpoint-finding-algorithm'. I don't know exactly what
they do.

> Want to send me a pubkey, so you can commit this stuff as you work on
> it? :-)  (Possibly to a branch, for now.)

That would be great, thanks. How should I name the branch?

Regards

Markus

P.S.: my monotone pubkey:

[pubkey address@hidden
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCl2JfTQQSnq4T/ABP39Vb8+N2j5tmU/e5E
Lt9ipl4HxVtzeEZmc2ICosDcF5fAqpPSTy9zb8f3vXjbr9GPjKADKfqVFEqcYlCus+trgT1p
cxregk65qzd61S2ae4yUQP+PwcCWKFFp3785ByOGmEftsYjYxQQig4l+INMj3IXc8QIDAQAB
[end]

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] [PATCH] cvs_import connecting branches, Markus Schiltknecht, 2006/02/17
- Re: [Monotone-devel] [PATCH] cvs_import connecting branches, Nathaniel Smith, 2006/02/18
  - Re: [Monotone-devel] [PATCH] cvs_import connecting branches, Markus Schiltknecht <=

Prev by Date: [Monotone-devel] Re: error: Extraneous data in key store (0.26pre2)
Next by Date: [Monotone-devel] Re: moving forward on delta storage
Previous by thread: Re: [Monotone-devel] [PATCH] cvs_import connecting branches
Next by thread: [Monotone-devel] "Address family not supported by protocol"
Index(es):
- Date
- Thread