gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] towards standards specifications


From: Tom Lord
Subject: [Gnu-arch-users] towards standards specifications
Date: Wed, 27 Aug 2003 09:57:48 -0700 (PDT)


An idea is being kicked around the list to try to make the conventions
for file tagging a standard across multiple revision control systems.
A few words about that are enclosed.   I've tried to give a high-level
overview of the design space questions and sketch in some history
about previous attempts to standardize.


* Things to Standardize

There are several things to standardize, not just tags.  The way in
which the "set of things to standardize" breaks down into atomic units
of "proposed standard" is not arbitrary.

I think the things to standardize are:

[A] `inventory'

[B] `mkpatch/dopatch' and changeset format(s)

[C] archives and project trees:
    - a global namespace and taxonomy for revisions
    - a format and semantics for log files
    - the format and semantics of the in-tree patch log
    - a transport-independent spec of basic archive transactions

[D] mappings of basic archive transactions onto transports

[E] an extension mechanism for adding additional archive transactions

If you imagine a world of sweetness and light in which those five
things are standards, then what is tla?  It's (currently) a particular
implementation of [A..D].  The set of standards described by [D] could
grow -- in which case tla (if unchanged) would be an implementation of
[A..C] with some of [D].

I'll go into greater detail about each of those areas below.  I'll
state at the outset though, that these standards don't necessarily
completely specify a revision control system.  For example, if
Subversion supported [A..C] and some of [D], it could, nevertheless,
do more in addition to that.

In other words, it wouldn't be the aim of these standards to turn all
systems into arch.  Rather, it would be the aim to turn arch into a
collection of methods for interoperability between revision control
systems: methods that happen to be implementable in stand-alone form
in a very easy way.  It would turn out that revision control systems
that employ these methods obtain, as a side effect, the features of
distributed operation, smart merging, and easier integration with 
ancillary tools.

v
* Design Space Notes and Past History of Standardization Efforts

[A] `inventory'

    including the `=tagging-method' functionality and the
    representation of individual file tags.   Those two 
    go together.

    Interesting dependencies: the choice of regexp language; 
    filename syntaxes and semantics on various platforms.

    Problems encountered starting standard discussions with
    Subversion and Darcs:

       (A1) representatives of both projects maintained that the
            "internal" history records of the revision control system
            eliminate the need to assign files a "logical identity".
            (I am not convinced that their ideas in this regard are
            both useful and coherent, but I suppose time will tell.)

       (A2) neither project makes significant use of naming
            conventions, as far as I know.

       (A3) the most immediate reasons for such a standard,
            to enable interoperability and to enable the exchange
            of changesets, did not appear to have appeal to either
            project.


    Discussion on these topics was not extensive.  There appeared to
    be greater interst in standardizing a changeset format.  I
    reasoned that in any effort to standardize a changeset format,
    the need for an inventory mechanism would become clear and that
    the question could then be raised in the context of that
    motivation.  Discussion never got that far (see below).

    Unexplored:  I haven't reached out much to the meta-cvs project.
    I _think_ I recall that it _does_ have a concept of logical 
    file identifier, so that might be interesting to look into.

    Possible partial solution to problems: as with putting 'arch-tag:'
    lines in CVS-managed Emacs sources, the cooperation of other
    projects is not necessary in this area (though it would benefit
    both those other projects and users).

    It _might_ be intersting to factor out `inventory', `mkpatch',
    and `dopatch' into a separate distribution.



[B] `mkpatch/dopatch' and changeset format(s)

    Of particular interest, imo, is the behavior of `dopatch' 
    for inexact patching and the implications that has for 
    what makes a reasonable changeset format.

    It seems wise, in light of the different uses for changesets, to:
    describe the format in terms of an abstract syntax; describe the
    behavior of mkpatch/dopatch in terms of the abstract syntax;
    provide multiple surface syntaxes for the abstract syntax.

    Interesting dependencies:  [A], above;  diff and patch tools or
    standards;  interfaces to external merge tools.

    Problems encountered starting standard discussions with
    Subversion and Darcs and non-RCS-implementors:

       (B1) representatives of both RCS projects reject the importance
            of inexact patching.   Subversion developers argue that
            the internal history records of the revision control 
            system should be used, in combination with a family of
            techniques called "variance adjustment", to turn 
            inexact patches into exact patches.  The Darcs project
            has a similar view.

            Representatives of arch pointed out that (a) variance
            adjustment has significantly different and arguably often
            less useful semantics;  (b) in forseeable situations where
            history is not available, variance adjustment doesn't
            apply usefully at all.   

            Discussion degenerated after that into arguments about
            whether or not history would always be available.

       (B2) A very popular intuition is that a whole-tree changeset
            is basically a shell script (using `mv', `rm', etc.)
            plus a set of ordinary diffs.   Discussion frequently
            got bogged down on questions of the best surface syntax
            for such changesets.

            Representatives of arch tried to point out that,
            especially when inexact patching is considered, that
            intuition is wrong.  Discussion should start, we argued, 
            with consideration of the _semantics_ of mkpatch/dopatch
            and development of an abstract syntax to describe what
            is exchanged between those two tools.   

            Discussion frequently got bogged down into questions such
            as "Well, suppose we use MIME for this format" or "suppose
            it is an actual shell script".

       (B3) Another very popular intuition related to (a) and
            (apparently) to the behavior of Bitkeeper is that the
            changeset format must make special provisions to carry
            revision control history.   For example, it was frequently
            suggested to add "fields" to carry a revision name and a
            log message.   Part of the motivation here seemed to be
            "I expect to see that information when somebody sends me
            a changeset in email."

            Representatives of arch pointed out that such components
            are _not_ necessary in changesets and, in fact, make no 
            sense when changesets are used outside the context of a 
            revision control system.   Furthermore, such content could
            clearly be _layered_ on top of changesets in a second
            standard.  Finally, one could not reasonably and usefully
            address such fields without dragging in new questions 
            about the syntax and semantics of revision names and log
            messages -- so layering would help to postpone and
            separate consideration of those issues.

            Discussion along these lines typically spiraled into 
            discussion of (B1), above.

    Possible solution: It _might_ be intersting to factor out
    `inventory', `mkpatch', and `dopatch' into a separate
    distribution, especially in combination with a mail-friendly
    syntax for changesets.   _IF_ these tools were more widely seen
    as independently useful, and _IF_ they were more widely adopted,
    then perhaps developers of other RCS systems would start getting
    questions like "why doesn't svn work correctly after I apply a 
    changset to my tree?"



[C] archives and project trees:
    - a global namespace and taxonomy for revisions
    - a format and semantics for log files
    - the format and semantics of the in-tree patch log
    - a transport-independent spec of basic archive transactions

   No detailed discussions got this far.

   Most of those items are hopefully pretty clear.  By
   "a transport-independent spec of basic archive transactions" 
   I mean:

        (a) a taxonmy of revision types (import, commit, tag)

        (b) transport-independent specifications of the basic
            transactions (e.g., a `commit' requires a .tgz of the
            changeset, a copy of the log file, the name of the 
            revision to create -- it returns a status code which
            may be any of revision-locked, no-such-category,
            no-such-branch, etc.)

   Past experience suggests that the namespace question has _no_
   answer that will satisfy all intuitions of "good", and _many_
   answers (arch's being one of them) that will satisfy essentially
   all actual needs.  In other words, it's just about a perfect topic
   to discuss endlessly and the only chance for a standard is to have
   a standard body who's motivation going in is to pick something with
   as much wisdom as they can muster and withstand years of subsequent
   flaming.


[D] mappings of basic archive transactions onto transports

   No detailed discussions got this far.

   Evidence from [gnu-]arch-users is that people too often assume
   that the on-disk arch archive format is essential to arch.
   In my view, it is _not_, however, it is evidence of a good 
   choice of basic archive transactions and it _may_ prove to be
   the best format in the long run.


[E] an extension mechanism for adding additional archive transactions

    No detailed discussions got this far.   I have a couple of ideas
    but I won't go into them just yet.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]