gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: Working out a branching scheme [was: tag --seal


From: Tom Lord
Subject: Re: [Gnu-arch-users] Re: Working out a branching scheme [was: tag --seal --fix]
Date: Sun, 4 Apr 2004 07:28:32 -0700 (PDT)

    > From: Stefan Monnier <address@hidden>

    >>>> A total history containing 11K revisions is no sweat for arch.  A
    >>>> total history containing 100K revisions is no sweat.  Putting all
    >>>> those revisions in a single arch version?  You're moving into the area
    >>>> of using arch poorly.

    >>> Is there a fundamental reason why this is poor use?

    >> There's fundamental reasons why it's poor use if all those changes
    >> take place over a short-enough time period (say, a year or 16 months)
    >> and if the tree is "one big thing" rather than something that could
    >> reasonably be broken-up into sub-categories.  Basically you have some

    > I don't assume a short period of time (I'm thinking of the Emacs
    > repository which is almost 10 years old (if you count its RCS
    > life)).  With CVS, the >10K revisions all in "the same big
    > thing" work just fine.

I disagree that CVS works "just fine".  Just as with arch, there are
some operations that work "just fine" in this case and others that
don't.  

One trivial example of an operation that doesn't work "just fine" at
this scale with CVS is the initial mirroring of a CVS repository (as
in doing a first fetch of the GCC repository).   I wind up having to
xfer _far_ more data than I could reasonably need.   And, while xdelta
can help a _bit_ with incremental updates to a CVS mirror, it's really
stuck fighting against the storage format.

I'm sure that there are query operations on CVS that will be
problematic.   

At that scale, CVS is notorious for being vulnerable to data
corruption where that corruption can easily impact very wide swaths of
history.

Really, arch and CVS aren't _that_ different in this regard the main
difference, in my view, being that arch _has_a_clean_solution_ --
namely (and primarilly) archive cycling.  With CVS, you're just outta
luck.

A lot of your thinking here seems to be premised on the idea that
archive cycling is bad .... to be avoided .... that the fact you might
want to cycle an arch archive makes arch somehow worse than CVS.

I think such concerns are misplaced.


    > Nobody has had to figure out some good split into sub-trees or to try and
    > cut the development into several versions (and the associated need to tell
    > the world where the new head of the trunk is to be found and to switch
    > their checked out tree to that new branch).

In the past year or so of GCC development, alternative CVS mirrors
have appeared because of other kinds of scaling difficulties CVS has.

The commands needed to access Savannah repositories have changed.

Every few months in GCC new tags are created and many developers have
to adjust their trees for that.

Archive cycling is no more intrusive than any of those operations.


    >> tree that's changing at that rate and, at that rate, nobody working on
    >> or with the tree can really keep up with the changes.   It's the "dog
    >> pile on the mainline" technique.   The fundamental problems here are
    >> from the user perspective -- you're not using the tool to do something
    >> useful.

    > Ahem, I do consider Emacs to be useful and I do consider the use of CVS to
    > have been very useful for Emacs's development.

There is plenty of brilliance in the design of Emacs and plenty of
cruft in the implementation and the practices used to maintain it.  No
doubt that network-accessible CVS has been an improvement over "RMS
alone changes the mainline" but that's not saying much.


    >> There's a user-fundamental reason why it's poor idea if the project
    >> can reasonably be split-up into sub-categories or if all these
    >> revisions take place over many years.   Do you really want "tla
    >> revisions -s" (for example) to list 11k commits?

    > I never use `tla revisions', so I don't see why that would be a
    > problem.  What would a user use it for?

I use it all the time to see what has changed recently and to help
with merging.

I guess I also left out there the issue of mirroring.  If you're
working on the head revision, a mirror is very handy but a history of
11k past commits is usually not.


    >> Finally, there's a tool-design fundamental reason why you don't want
    >> things that big.  Sure, 100k same-version-commits can be done, but
    >> only at the cost of a big leap in implementation complexity.   If
    >> there's no compelling use-case reason to want this -- why pay the
    >> costs associated with writing, maintaining, and administrating a more
    >> complex tool, especially when such a simple alternative is achievable?

    > OK, so you mean the problem is just an implementation
    > limitation.

Not "just" as in "only".   Or else not "the problem" but "this part of
the problem".   But other than that, yes, there is a fundamental
implementation difficulty there.

    > I can definitely live with that answer much better than with explanations
    > of how it would be wrong to even think of using a tool that way.

    > As you probably know I just consider archive-cycling to be fundamentally
    > wrong from the user's point of view.  

Yes, but, I think your concerns are misplaced.

    > Sometimes it is a very natural step because it's related to the
    > use of branches, but imposing it because of implementation
    > limitations seems just unfortunate.

One can impose it just to keep the "unit of manipulation" of the
history data in "human scale chunks".

    > I understand the need for archive-cycling from an implementation point of
    > view, but I think it should be made as transparent as possible from the
    > user's point of view.  Maybe the answer is just to make `config's more 
user
    > friendly since their additional level of indirection can be used to
    > hide things like archive-cycling (the archive-cycling could then even
    > happen automatically every 1k revisions).

Now you're speaking my language.   That is, indeed, a good goal for 
itla/overarch.


    > I have several times used Emacs's revision history to get `annotate' and
    > `diff' info dating back to very long ago.  So if we have to add some
    > archive/branch/version boundary along the way every year or so, it's very
    > important that tools like `annotate' are able to ignore those boundaries.

There are fundamental trade-offs there.  One can optimize for fast
access to _long_ `annotate' histories -- or one can optimize for many
more common operations.  One simply can not do both short of making an
implementation so hairy and so expensive to administer or so flakey as
to be worse-than-useless in many situations.

A fun project to undertake might be this:  write a server that can
mirror archives (like James' supermirror) but add to it a component
that builds an ancillary database that can answer `annotate' queries
cheaply.  

Such a thing is _certainly_ needed and many people will, I'm sure,
appreciate having it done (not least, me).

A tactic that you might consider: use CVS or RCS or (least likely but
possible) SVN to hold the annotate database.   The problem is reduced
to plumbing and namespace mapping.


-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]