[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bazaar migration status?

From: Stephen J. Turnbull
Subject: Re: Bazaar migration status?
Date: Fri, 24 Jul 2009 11:43:55 +0900

Ken Raeburn writes:

 > I didn't mean to cast aspersions on anyone working on this.

Sorry, I didn't mean to imply you did, just that this is a one-time
thing and I think we can assume that the workers will give adequate
warning and documentation.

 > I guess I'm just wishing the tools provided some extra metadata
 > were carried along somewhere that would make it Just Work without
 > me having to track down and sort out where the two versions of
 > history match up.

Unfortunately, defining and computing "two versions of history match
up" is extremely hard.  Humans look at a program and say "obviously
this change is irrelevant to anything I care about."  But to the VCS,
a change is a change.  For example, ChangeLogs are a real PITN.
Suppose you change Emacs (patch A), and I change Emacs (patch B).  Now
we mutually cherrypick and merge.  In a DAG-based VCS (which can
represent concurrent development of this kind), we should be in the
same state, right?  Wrong.  The DAG can represent non-determinism, but
*the ChangeLog cannot*.  There's no reason to expect our trees are
identical, and every reason to expect them to diverge (you probably
cherrypick mine on top of yours, and I do the reverse, so our change
logs have different orders).

Now, if we coordinate our workflow, we can probably get it right: I
pull from you and you pull from me, and whoever pulls first will
determine the order.  But of course we're assuming here uncoordinated

 > (And likewise for everyone downstream.)  After all, it will be in a  
 > sense the same repository

To quote one of my less favorite Presidents, "There you go again!" :-)
See?  You're very willing to go with "I know they're the same, so, yo
git, what's your problem?"  I have no problems with humans saying
that, but it's the *last* kind of attitude we want our VCS to take!

 > after the conversion as it was before; tracking it across the
 > switch ideally ought to be trivial and automatic.

It is, except that the conversion from CVS is fraught with
nondeterminism; that's why the work that Andreas is doing is so
important.  After that, all of the dVCSes are happy to work with git's
fastimport format so in that sense they can share history.  (AFAIK, to
compute revision IDs they all use a subset of git's revision-ID-
relevant metadata: name, email, and timestamp for committer and author
respectively, tree content -- perhaps represented as a digest, and log
message.  There are variations, but so far they're all homomorphic --
I think git actually keeps the most detail.  I wish git had adopted a
Dublin Core derivative as a framework for metadata extensions, though
... the lack of a comprehensive standard here makes me nervous.)

 > I'm assuming we want the history as shown by git or bzr to show  
 > exactly the same sources, changed in exactly the same way at exactly  
 > the same times and by the same people and with the same log messages,  
 > as we see now with CVS or git.

It will, up to the limitations of CVS.  Commit timestamps may differ
slightly from CVS (the official cvs2svn tool will amend CVS commit
timestamps to ensure that the revision numbers and timestamps are the
same total order, compatible with the ancestry partial order -- I
don't know if that's what Andreas is using, though).  Of course, CVS
timestamps are in general ranges (each file commit gets a separate

 > What I've been hearing is that the result of mirroring the bzr  
 > repository into git isn't likely to give the same commit revision ids  
 > as we have in the CVS->git mirror.

That, I believe, is true.  But AIUI it's not because of inherent
nondeterminism in the conversion process being used, it's because CVS
suffers from nonatomic commits.  The history is continuously being
cleaned up (the conversion process tuned), so that in fact the
official bzr repository will have a different history from the
existing CVS->git mirror.

However, the content is going to be "very close" to the current git
mirror.  I have some ideas how to exploit that for a conversion of
your git repo; feel free to get in touch with me off-list.  (Once you
have a git repo, the git->bzr conversion is straightforward.)

Note that in git you can guarantee that your branch gets imported
lock, stock and barrel with *zero* content changes by using
git-filter-branch.  The idea is to reset the node of your branch to be
somewhere in the new tree by changing its parent.  This will result in
new revision IDs for all commits in the branch, of course, but the
content and all metadata (except for ancestry) is identical.

You can even just import your whole repo into your clone of the
"official" git mirror of the Emacs repo; git itself doesn't care.
Although this is likely to confuse your downstream (and maybe even
you), as well as tools like gitk, because the repo will be
multi-rooted.  But you can then use all the git tools (rebase,
filter-branch, etc) to work with the resulting (disconnected) DAG.
You can even reconnect it, although it's not likely that the result
will be much more useful than the disconnected DAG.  (Look for "paste
the other history" in git-filter-branch(1) for details.)

 > > In that case, somebody has deliberately changed the metadata of
 > > history (eg, git-filter-branch).  But you can analyze this with the
 > > tools git provides: [...].

 > Interesting... that could be the additional data I was asking for  
 > above.  Thanks!
 > But, the expectation of different revision ids still concerns me -- is  

Well, you can also use git-log --diff-stat to get information that can
be used to find "closest approach".  (This is the idea I mentioned above.)

 > that because of minor differences in content, like maybe  
 > translating .cvsignore into something else for bzr or loss of some  
 > trivial bit of data like file modes,

The different VCSes calculate the revision ID differently, but there
should be 1-1 conversion (up to the possibility of hash collisions, of
course).  But the conversion process is automatic and metadata
oriented; it will not convert content.  To some extent that can be
done automatically, but it should be done as a commit on top of the
converted repo.

BTW, don't get concerned about all the potential complexity I'm
describing.  You'll only need it once, and there are a lot of gitfans
around who will be more than happy to help with it.  We'll even look
the other way if you decide to convert everything to a bzr repo in the
end. :-)  Ie, git is the tool of choice for doing the surgery on
history, but after that, it's personal preference (and the conversion
processes among the leading dVCSes are straightforward).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]