[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Moving to git

From: Thomas Schwinge
Subject: Re: Moving to git
Date: Wed, 24 Jun 2009 15:23:29 +0200
User-agent: Mutt/1.5.11


On Tue, Jun 23, 2009 at 01:25:34PM +0200, olafBuddenhagen@gmx.net wrote:
> On Thu, Jun 18, 2009 at 03:02:41PM +0200, Thomas Schwinge wrote:
> > Olaf asked whether we could fix the author and committer information
> > for the changesets.  This can't be done reliably in an automated way
> > and surely no one wants to inspect 10,000+ changesets manually.  As I
> > consider a correct-believed but nevertheless incorrect automatic
> > conversion worse than the current one where you exactly know that the
> > information is not accurate, I decided to leave this alone as is.
> I don't agree. What's the use of this damn pedantic GNU-style changelog
> format, if we can't even reliably extract author information from it?!

You can extract that information, but it's a manual process.  It involves
looking into the ChangeLog *file* and extract the information from there.

Consider this case: P commits change C1 with log message L1.  L1 does
*not* contain any authorship information, but contains soleley the
textual description of the actual change.  P commits further changes Cx
with log messages Lx, as above.  Eventually P will commit change
C(ChangeLog) with log message L(``.'')).  Only then the ChangeLog will
reflect the correct attribution of the changes.  And note that I didn't
make this up, but this is in fact how it has (partly) been done in the
past.  This means that the revision control system alone will never be
able to correctly describe the authorship information of individual
changes.  Only the ``serialized form'' (say, a tarball release's
ChangeLog file) will have the correct information.

> I also do not agree that having everything wrong is better than having a
> few errors, perhaps, or maybe not.

I object: having most of it correct, but not all, gives the false
impression that everything would be correct.  And as we're talking about
legally relevant matters here, better be on the safe side.

> (And it's not even more consistent, as any new commits will have it
> right.)

Indeed there is a cut-off point, and before that one the ChangeLog is
correct, and after it the Git information is correct.  On the cut-off
point (i.e., now), the ChangeLog files will be removed from the trees (or
be renamed to ChangeLog.old or whatever).

> Also note that not only the original Author information is missing, but
> also the Committer is "tschwinge" for all commits -- I guess you did
> some careless rebasing or something like that... So the result is that
> the Committer is bogus, the Author contains the actual comitter, and the
> real author is only mentioned in the Changelog. That's extremely ugly
> and confusing IMHO.

We can't really do anything about it.  The situation after the cut-off
point: standard Git usage.  The situation before the cut-off point:

    Git committer name == CVS committer name, or tschwinge
    Git committer date == CVS committer date, or a recent date
    Git author name == CVS committer name
    Git author date == CVS committer date

Indeed -- you guessed correctly; set aside the ``careless'' allegation --
the Git committer {name,date} != CVS committer {name,date} in the cases
where I manually re-spooled a lot of series of commits in order to
re-craft proper merges between branches.  So, in Git sense, it is correct
that the Git commit {name,date} is updated to the person having
re-committed each original commit.

So, to sum up: after the cut-off point, everything is as expected, and
before the cut-off point, the Git committer information is useless, and
the Git author information is the CVS commiter information, and the
changes' author information is hidden in the relevant ChangeLog file.

> > Also, there was the idea of aggregating all the individual one-file,
> > [...], then ChangeLog commits into aggregates, but this also can't be
> > done reliably in an automated fashion without a lot of manual
> > corrections (as could be seen in the glibc CVS to Git conversion), so
> > I also left that alone.
> I feared that much... It's a pity, but I guess there is nothing we can
> do about it :-(

Unfortunately, yes.

See, these are (a part of) the resons why the CVS to Git migration took
that long.  As you, I wanted to get it all right, so that the Git VCS
information is ``correct''.  But it is just not possible (without manual
intervention, of course).


Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]