monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: monotone CVS import failed.


From: Jon Smirl
Subject: Re: [Monotone-devel] Re: monotone CVS import failed.
Date: Sat, 28 Oct 2006 12:11:15 -0400

On 10/28/06, Michael Haggerty <address@hidden> wrote:
Markus Schiltknecht wrote:
> Jon Smirl wrote:
>> I have been trying with cvs2svn for three months now and progress
>> isn't happening. You at least seem interested in making things work
>> right for Mozilla.
>
> I don't think so. The graph algorithm is just very new and we still have
> to experiment with it. I don't know about the details why cvs2svn (graph
> based) fails with importing Mozilla, though.

For the record, cvs2svn works *fine* with the mozilla repository.  I've
converted it multiple times without problems.

Fine depend on how you define fine. I agree that cvs2svn is capable of
doing a conversion of Mozilla CVS to SVN. But I don't consider a
conversion where 60% of the symbols and branches are based off from
multiple change sets to be 'fine'. Many of the symbols I have looked
at have been based off from change sets over six months apart. Some of
the branch heads are based off from fifteen change sets.

I have decoded several of these bases by hand and I was able to
produce alternate change set sequences that resulted in a linear base.


What doesn't work us using cvs2svn *with Jon Smirl's changes* to support

That is crap. The symbol/branch basing problem occurs with an
unmodified cvs2svn as has been demonstrated with multiple examples.

My changes only altered the output format and do not change the import
process or change set selection code. I have abandoned that work
anyway since it was rendered useless by multiple refactorings of the
base code.

conversion from CVS *to git*.  This was never a design goal of cvs2svn.
 We (meaning I, since I am for all intents and purposes the only cvs2svn
developer right now) would like to support him in making this work, but
I haven't had much time recently to work on cvs2svn.

IIUC, the specific problem is that git is missing some features that CVS
has, like the ability to create tags based on multiple branches.

Git can base on multiple change sets by turning all of the symbols
into branches. But after loading data built this way and using the
visualization tools, it was obvious that this was not a sensible
import of the original repository.

The difference is a spider web of links between versions that a
computer can figure out, and a more linear history that make sense to
a human.

Note that git is snapshot based, not change set based. Git does not
need the base to be a single change set, it wants the base to be a
single snapshot. To do that there needs to be a point in the snapshot
sequence where a symbol can be inserted. If a rev created after the
symbol get combined into a snapshot in front of the symbol, it then
becomes impossible to insert the symbol and a branch has to be made.
The copy commands cvs2svn is using to make the symbols are effectively
little hidden branches.

I also acknowledge that CVS symbols can be created piecemeal and that
some of the symbols in Mozilla were created this way. I just think the
cvs2svn conversion process is creating way too may symbols piecemeal.

It may be that all of the symbols in the Mozilla repository were
created piecemeal. But until someone builds an importer that treats
symbol creation like a change set with dependencies we don't know if a
better conversion is possible. In the few cases I decoded by hand I
was able to locate a valid, alternate change set sequence that
eliminated the need for a branch.

Whether this is a worthwhile feature or not is debatable, but CVS and
Subversion can both do it.  So it is obviously more work to convert a
CVS repository to git than to Subversion.

(Will you also have this problem when converting to Monotone?)

>>> Did you watch memory consumption?
>>
>> Around 1.2GB when it died.
>
> That's good to know. IMHO it's the main difference between my monotone
> cvs_import rewrite and cvs2svn's graph based approach: the 'in-memory'
> vs. 'on-disk' issue.
>
> It convinces me that spilling to disk is not necessary, because it looks
> like the whole mozilla repository with all its blobs and its
> dependencies fits into 1.2 GB of memory (this is of course excluding the
> files and its deltas itself).

Is the monotone conversion done in C/C++ or a scripting language?
Because I think the Python object overhead would make an in-core
conversion too expensive for the largest archives, at least without
packing in-core information into strings in binary format or something.
 The on-disk databases and multiple passes also give us resumability of
a partial conversion.  But I readily admit that the on-disk + pass
structure of our conversions is a lot of work to support and extremely
expensive in terms of conversion time.

Michael



--
Jon Smirl
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]