gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why PQMs count (was Re: [Gnu-arch-users] GCC v. Arch address@hidden:) Re


From: Tom Lord
Subject: Why PQMs count (was Re: [Gnu-arch-users] GCC v. Arch address@hidden:) Regressions on mainline]
Date: Tue, 22 Jun 2004 01:10:17 -0700 (PDT)

    > From: "Stephen J. Turnbull" <address@hidden>

    >     Tom> Some would say, of _any_ high commit rate --- any rate too
    >     Tom> high for developers to keep up-to-date with --- "Hey, use
    >     Tom> branches more.  Slow down there, partner."  They would be
    >     Tom> right, 90% of the time.

    >     Tom> In this forward, though, is evidence that GCC is in the 10%:

    > ------------------------------------------------------------------------
    >     >>>>> From: Mark Mitchell <address@hidden>

    >     Mark> Since we have a policy of not checking things without running
    >     Mark> tests, and yet I'm seeing these failures on multiple
    >     Mark> platforms, I'm perplexed as to what has transpired.

    >     Mark> Would someone please explain why these tests are failing and
    >     Mark> what is being done to fix them?
    > ------------------------------------------------------------------------

    >     Tom> Nobody cares that you can't really keep up with GCC mainline.
    >     Tom> You can't "keep up" -- but you can do what Mark is doing
    >     Tom> here.  And that's the whole point of an integration branch.

    > Could you unpack that a bit?  

Glad to.

    > I see Mark make a strong but polite
    > request that somebody rectify an apparent policy violation; I don't
    > see what that has to do with commit rate vs. branch rate.

Read literally, the policy _is_:

        "Don't commit without testing."

Reading Mark's message in isolation, the policy apparently (to the
casual reader) _means_:

        "Don't make a breakage-inducing commit."

CVS, with its lack of atomic commits, makes what the policy is
supposed to _mean_ literally impossible.  I can test my tree.  I can
make sure it's fine in lots of ways.  What I _can_not_do_ with CVS is
actually commit that tree in such a way that I am very confident that
there is a "point" on the linear history of GCC mainline which
corresponds to my tree.  Whatever testing I do before a commit,
strictly (but significantly) speaking, it does not apply to GCC
mainline.

And, in this context, that's a virtue of CVS because it enables a very
high commit rate --- a commit rate so high that an arch-user using
naive practices would have trouble keeping up (because arch doesn't
like you committing from an out-of-date tree yet GCC-in-CVS gives
you essentially no other choice).

Imagine yourself wanting to enact the seeming policy.   You make a GCC
tree.  You add in your changes.  You want to commit.   Before you
commit, you run the tests.  The tests all pass, fine -- but they took
so long to run that now your tree is out-of-date.

So, in one sense, it's all a big sham.  The policy is a joke.  It
might as well be a policy that all developers may only commit while
levitating for all the realism of its literal interpretation.   "Don't
induce breakage, you say?  Fine --- then give me the exclusive right
to commit for the next 6 hours and I'll conform."

In another more important sense, as a _probabilisitic_ approach to
keeping the mainline happy, it's a good policy.  By testing before
commit I _probably_ won't break the mainline.   The overall modularity
of GCC makes this probabilistic strategy a viable one --- if
interference between patches were randomly determined this approach
wouldn't work.

In this specific case, it appears that the breakage was _not_ induced
by CVS non-atomic-lossageness but rather by a deliberate action, a mix
of "I'll get to that last bit soon" and "Gee, it worked for me."

So, yes, you're right -- part of what's going on with that message is
just Mark clearing his throat and holding up the yellow card in his
role as referee.   Somebody didn't test as well as they should have or
disregarded the results as insignificant and, as the release manager,
it falls to Mark to say "Bad Hacker, No Jolt For Your" while retaining
his community-granted immunity from resentment.

But, keep looking.  The punch-line comes a few messages later when
Mark says:

    It is not acceptable to completely ignore this kind of breakage 
    indefinitely.  Two months is far too long.

"Two months".   Implying that there is a shorter period of time that
is _not_ "far too long".

Now do you see where the mainline is for these folks?  It's a thing
you don't want to be two-months behind but, on the other hand, you can
break it for up to two months before you're in serious trouble.  I.e.,
it's not a thing people rely on having monotonically increasing
quality.  I.e., it's just "dog pile on the mainline" and fix what's
noticed as broken later rather than "Form an orderly line and make
your proven-acceptable mainline changes one at a time.  No shoving and
no smoking, please."

The GCC mainline is a thing (other than during freeze periods and
similar exceptional phases of development) ... mainline is a thing
which is always, in effect, a few days _ahead_ of most developers.

More specifically, the mainline is not _controlled_ (during non-freeze
periods) by any one committer.  It's there.  It goes fast.  It's an
automated merge of separate development efforts.  People merge the
off-mainline developments into it.  Sometimes it breaks but that's
caught quickly and that's the point.

The GCC mainline commit rate is too fast to keep up with.  Check it
out and, by the time you have built it and run the test suite, odds
are you are out of date.  Yet it has value anyway.  In fact that _is_
its value.  It's a continuous integration branch: they all hack on it
using the modularity of GCC to make it probable but not certain they
won't break this blind-integration branch --- and the value of keeping
the branch itself is that when they are wrong, and do break something,
this is picked up on very quickly.

It wouldn't have been right for Mark post to say "Oh, somebody broke
mainline."  (And that's not what he did.)  It _was_ right for Mark to
say "Somebody broke mainline and timed out on fixing it.  Geeze, it's
been two months!"

It's not a human branch.   It's a deterministic branch (the auto-merge
of many "logical" branches) which so strongly resembles what will turn
into a release during a freeze-phase that it's worth testing it
nightly and putting out fires before they spread.   That's what Mark
is doing, AFAICT.

GCC mainline (non-freeze periods) is what happens if you replace Linus
with a shell script, so to speak.  Maybe, more precisely, what happens
if you give up the fiction (and it is one) that Linus maintains the
kernel mainline and regard him more properly as being more like Mark,
but a bit stronger in his authority over details: as someone whose
advice and consent, _if_ invoked, is the touchstone for what
committers dare to commit and what the rest of us can expect to surive
post-commit.  And that's a good fit for some systems.

It's not just a random good fit for some systems that happen, by
coincidence, to work that way.   It's a good fit for any program that
is:

        1) large

        2) modular

        3) hacked on concurrently by many people.  Not "done", yet.

        4) significantly socially valuable (in whatever context)
           that nominal ownership is just that -- nominal.  in truth
           the project is really quasi-democratic (for some, possibly
           non-decomratic in a larger context, definition of
           "citizen").



Any such program has good changes being produced for it, around the
"world" (literally the world, in GCC's case) around the clock.   

The rate of change, viewed as an irreducible natural fact, is greater
than any human maintainer can keep up with.

In place of a human maintainer, we posit an automated integration
branch.  "Everyone" (commiters, coordinating outside of revision
control) bounces their changes off of that rather than sending them to
a maintainer.  Mostly it works and when it fails, the beauty is that
that failure is easy to recover from and we're glad it occured as
early as it did.

In some sense, ahem, you could say that Arch (sans a PQM) is
outrageously _backwards_.   Absent a PQM, Arch, in its move towards
"whole tree orientation", takes the "C" out of "CVS".   You can't
commit unless you are up-to-date.   Oh, sure, we took from CVS the
idea that you don't have to "lock" a file before you modify it --- but
we won't let you commit that file (absent a PQM) unless not only that
file but you _entire_tree_ is up-to-date.   Geeze, even SVN doesn't
impose _that_ limitation.

But in a larger sense, _with_ a PQM, arch gives back the "C"
and throws in a lot of bonuses (like overall changeset orientation and
all that that implies).
 

-t

p.s.: In truth, GCC is right on the cusp, afaict:  it's commit rate,
last i measured, was right on the edge of what you could reasonably do
with just straight-up core arch, no PQM.   But Florian's posts got me
reoriented to think about --- "Ok, let's assume the commit rate of GCC
is just too high.  Now what?"   Hence --- integration and narrative
branches with a pqm to manage them.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]