gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Google Summer of Code


From: Thomas Lord
Subject: Re: [Gnu-arch-users] Google Summer of Code
Date: Sat, 22 Apr 2006 09:18:20 -0700
User-agent: Thunderbird 1.5 (X11/20060313)

Stephen J. Turnbull wrote:
> [....]

I think you are close to right on a few things but not quite right overall.

Git is very fast at many operations partly because its storage management
is "snapshot oriented": there is no need to compute a tree-delta at commit
time.

Git is also very fast because it's storage is (roughly speaking) a revision library sans hard-links: there is no need to apply tree-deltas to reconstruct a historic
revision.

If you scour the history of discussions you will see that, long before git came
along, we contemplated that for GNU Arch in the form of "commit directly
to revision library," and "compressed revision libraries sans hard links".
We underestimated the value of those features and didn't make them a priority.

Git is also very fast because it uses, in essence, "names based inventory".
Being in love with fancy merging, we never took names based inventory
very seriously and didn't put much work into it. If we had, tree-lint during
commit would have been 0-cost for names based trees and we could have
easily supported git's style of partial commits (for names based trees).
We got stuck on the question "Why would anyone want to do *that*?!?"

You contrast git with arch by pointing out that git reconstructs (intrinsic) history
from trees.   Um.  So does arch.   Arch *additionally* adds a structured
namespace to record intentional history but so what: that's just an extra feature.

You point out that git emphasizes "seat of the pants" merging. Arch has long had tree-delta and user-specified three-way merges. In general, at least for names method trees, there is no git merge technique that does not make perfectly
good sense for Arch, too.   If we had taken names method more seriously, we
might have arrived at hybrid techniques:  approximate-inventory-computed-
-from-content-summaries or approximate-inventory-using-implicit-method-
-on-trees-that-might-not-lint-cleanly.

Contentment with the storage model and disinterest in names method are the
main reason Arch wasn't ready to swoop in when the BitKeeper license was
withdrawn.  In some sense, that's my fault for not paying more attention to
Linus' stated need for speed and stated interest in names-based inventory.

All of this ads up to why I think Arch still has potential relevance.  You
mention yourself that you really appreciate Arch's merge logic. It's possible to have a smoother mix of the best of both worlds. (Arch is also still relevant
compared to git because of integrity issues.)

-t

Sorry to say, but after trying git pretty seriously, I don't think you
had a shot.  I'm not sure I can put my finger on it, but I'll give it
a try.

Arch, like Darcs, is patch-oriented.  It's also history-oriented.
This is inherently a bottleneck in any merge-oriented process, because
when you hit a conflict, it pertains to a given patch in some order.
A patch-oriented SCM needs to stop there to avoid horking things even
worse.  Darcs tries to get around this with "patch theory," but there
are some things (eg, files with "hot spots," like ChangeLogs), that
Darcs doesn't handle any better than anything else.  And its algebraic
manipulation of the patch chain also has bad algorithmic properties,
sometimes I think it's exp-exp. ;-)

git, on the other hand, is snapshot-oriented, with an efficient
representation of the snapshots.  A patch is defined as the diff of
two snapshots.  It is no better than Arch or Darcs at avoiding
conflicts, of course, probably substantially worse, in fact.  In my
limited experience with all three, what it does substantially better
than either, though, is two things.  (1) Since you "teleport" directly
from here to there, *all* of the conflicts show up in one shot, giving
the manager a better shot at deciding whether the merge is feasible,
or if he needs to go to "Plan B."  (2) It's much easier (again in my
limited experience) to figure out what "Plan B" is.

Regarding performance, I'm not using git across a network, except to
occasionally update git itself.  However, local operations all seem to
be basically O(diffsize).  Commits are usually instantaneous, diffs
seem to go about as fast as the output device can handle, etc.  I
tried some experiments with micro-branching in arch; they ran into a
performance bottleneck pretty quickly.  Branches in git are plenty
lightweight for micro-branching (which git people call "topic
branches").  They're easy to make, they're easy to commit to, and
they're fast enough to switch back and forth in a single workspace for
many of my purposes.  I *never* would have tried that in Arch, but
even on the abysmally performing Mac file system, it's very doable in
git.

The other thing that I like about git is that it constructs history
from a tree of objects, it doesn't store it centrally.  This means
that you can back-build a repository fairly easily.  This is important
to me because I'm transitioning from a badly broken CVS repository and
want to untwist the history.

Do you care?  I don't know.  Good performance on simple operations is
always nice, but might not be the sine qua non if you can get superior
merge capability.  The success of various cvs2* scripts show that most
CVS repositories aren't as broken as XEmacs's.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]