Re: [Gnu-arch-users] GCC v. Arch address@hidden: Regressions on mainline

gnu-arch-users
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] GCC v. Arch address@hidden: Regressions on mainline

From:	Tom Lord
Subject:	Re: [Gnu-arch-users] GCC v. Arch address@hidden: Regressions on mainline]
Date:	Wed, 7 Jul 2004 17:26:54 -0700 (PDT)


    > From: Erik de Castro Lopo <address@hidden>

    > On Tue, 22 Jun 2004 19:15:47 -0700 (PDT)
    > Tom Lord <address@hidden> wrote:

    > > I thought about a PQM-driven Aegis-like protected mainline but I don't
    > > think it works out unless you do it in a _fairly_ hairy way.

    > Until very recently I worked with Peter Miller the main author of Aegis.
    > I forwarded him a copy of this mail and he repiled with the following
    > which he OKed for forwarding to the list.

Geeze, I'm sorry to take so long replying.   Despite appearences, it's
been a very busy time at arch central.


    > Peter Miller <address@hidden> wrote:

Taking one point out of order:

    > > My tolerance for my projects having a broken baseline
    > > (even in the time between releases and code freezes) is far less than
    > > Mark's, and Aegis (like Arch) is fussier than CVS about many things, but
    > > you *can* make a commit that will break things with Aegis.  It's just
    > > harder to do (just less *likely*) with Aegis, is all, and none of us
    > > want to intentionally break things anyway.

I think that that's the fundamental place where we basically agree and
we're just two people talking as if having a disagreement but not
really.  In what follows, I just (for the benefit of the gnu-arch list,
mostly) elaborate on some of the full-test-vs-partial-test and
strict-test-vs-tolerent-test issues.



    >>> I thought about a PQM-driven Aegis-like protected mainline but I don't
    >>> think it works out unless you do it in a _fairly_ hairy way.

    >> Please expand what you mean by "hairy".

I should preface this by saying that I'm using "Aegis-like" in a
possibly too general way.   If there's some trick I'm missing here 
that Aegis has I'd love to here about it.  Doesn't sound like, though,
from your description.

It seems to me (and you) that two goals are hard to achieve at the
same time.  These are:

        ~ sustaining very high commit rates
        ~ imposing a test-barrier on each commit

A test-barrier implies that commits from an up-to-date tree have to be
spaced far enough apart in time to allow tests to run between commits.

Tests that take more than a few minutes to run will slow the commit
rate down below that of some successful projects (e.g. GCC).


    >> I like to think that Aegis has a clean but versatile process, grounded
    >> in the realities of real-world software engineering.  I could be wrong. 

I'm "pushing" (very gently, I hope) contributors to arch to help build
something very close to Aegis (as I perhaps imperfectly understand it)
but taking into account the high commit rates that some projects want.

In particular, I want automated testing to be asynchronous and to lag
behind the head revision.   Instead of rejecting a bogus commit before
it hits the repository, I want to allow bogus commits to exist for
some time (e.g., up to 48 hours) before the automated testing raises a
red flag and says "this needs to be fixed".

By "hairy", above, I meant that with some very fancy footwork: testing
what you _think_ will be committed and what _probably_ will be
committed but which might not be if some earlier-initiated but
not-complete test happens to fail.   In other words, naively we have:

        * --------------------*-------------*      ...
        |            |        |             |
        start       test   complete        start
        commit             commit A        commit 
        A                                  B

one might have:

        * --------------------*
        |       |    |        |
        start   |   test   complete
        commit  |          commit A
        A       |
                |
                start   --------------------------*
                commit                |           |
                B assuming            |           complete
                that A passes       test A+B      commit B
                tests

with logic to back off of the B commit if, after all, the A commit
fails.   Implementing that would be pain enough but, on top of that,
I can't imagine a nice interface to that kind of "subjunctive commit" 
("if A were to pass then my commit is B").


    > > 1: How you build:

    > > Aegis is designed to watch the results of the build and will not advance
    > > the change set unless it builds.  This solves the problem where a
    > > developer commits a change set, and then the project no longer builds
    > > for anybody.  There is a checked-out copy called the "baseline" which is
    > > built as part of the commit... and since it isn't done in the
    > > developer's private work area, this also finds many "but it builds OK
    > > for me" class problems.

    > > Assuming you trust your build (if you don't, see my RMCH paper) the
    > > builds can be incremental.  And if you don't trust incremental builds,
    > > what makes you think a full build will do any better, USING EXACTLY THE
    > > SAME DATA?

    > > Assuming incremental builds most of the time (you can ask for a full
    > > build) and you are doing a maintainer build not a full GCC boostrap (the
    > > last time I build GCC it was over an hour for the full bootstrap) this
    > > build-during-commit is not the bottle-neck it may first appear
    > > to be.

Amazingly enough, anything more than a few minutes is likely to run
into some of the commit rates we see.

I think that the trade-off between a possibly-briefly-broken mainline
and a slower-maximum-commit-rate tends, for many projects, to come
down in favor of possibly-briefly-broken.    

The testing you propose doing and more are definately worthwhile.   I
just have come to think that it's ok for that testing to be a little
bit behind the mainline and to set a deadline for repairs rather than
absolutely prohibitting errors.

It's hard to measure how important the fast commit rates are.
Personally, if I were one of these fast projects, I'd want to consider
alternative practices that would slow down the commit rate but, on the
other hand, it's hard to argue with their overall success and so I'm
thinking they are not quite foolish as is.


    > > 2: How much you test:

    > > Quoting Tom Lord:
    > > > GCC commits happen too fast (last I checked) to serialise them while
    > > > inserting tests between each one.

    > > Aegis is also designed to (optionally) require that each change set be
    > > accompanied by a new test.  This new test is required to PASS, it isn't
    > > just decoration.  Things like GCC are a pleasure to write tests for,
    > > because the inputs and outputs are so easily controlled.

It's the exceptions that really seem to irritate the maintainers,
though.  Ranging from "I just want to check in this comment fix
without obstacles" to "Yes, Joe, Jane, and Rubin want to break the
mainline breifly and then fix it... that's the easiest way."

    > > If a change set is a bug fix, the test is also required to FAIL against
    > > the baseline.  This ensures that the test actually reproduces the bug...
    > > test fails on old code, passes on proposed code.  You don't have to
    > > write anything extra for this.

In the case of GCC, that's a big deal.   Passes and fails can be very
context specific (e.g., platform specific) and so you are implying
that the repository has to have access to all the relevent platforms,
some of which may be relatively slow.

GCC tests are farmed out fairly far and wide to get good platform
coverage and test-case coverage.   They're just intrinsicly slow yet,
on the scale of 48 hours, that's what they build their process around.


    > > There is also a third type of testing.  Aegis accumulates all of the
    > > tests which accompany change sets into a regression test
    > > suite. 

I think it's safe to say that GCCers would _love_ new and improved
tools to help them maintain the test suite but, perhaps, they should
not be overly intertwingled with the commit process.

    > > Developers can run this test suite at any time.  

Not in GCC-world.   As far as I can tell, no one developer can run all
of the tests that all developers have to be responsive to.

    > > You don't have to write anything extra for this.  Project
    > > administrators can require that change sets pass the entire
    > > regression test suite before being committed.  However, this
    > > can take time is is not the default for all change sets.  When
    > > I started writing Aegis in 1990, I had recently worked on a
    > > project with 3 CPU weeks (yes, weeks) of regression test
    > > suite.  For some projects, running the entire test suite for
    > > every commit is not practical.

For large projects in the free software world, that seems to be
tending towards the rule, not the exception.   GCC isn't alone.  (The
linux kernel is another easy example but any major large project is
also likely to demonstrate.)

1990, in spite of being not _terribly_ long ago, sharply predates the
big spike in widely distributed development and testing.  It's a very
_interesting_ time for people like you and me.


    > > However, Aegis offers a happy medium: by using the change set meta-data,
    > > it is possible to form a correlation between source files and tests,
    > > allowing Aegis to suggest tests in the regression test suite you may
    > > want to run based on the source files in your change set.  Not as fine
    > > grained as code coverage analysis, but usually very useful.

That is, in my humble opinion, an exceedingly cool meme.

How about a feedback loop there: keep historic data on which files
tend to effect what tests?  Or is that what you mean in the first
place?

Thanks for reporting this idea.


    > > Note that "make test" is usually the entire regression test suite.  As
    > > mentioned in the thread by Tom Lord <address@hidden> this is likely to
    > > make the queue grow without bound on any project the size of GCC.

Even running a test subset can screw you, though.   What if my change
is likely to break the Power-PC port (if anything) but I don't have a
power-pc on hand and neither does the repository server.   That's the
kind of situation we're looking at.   (Get's worse if you think about
the combinatorics of PC cards and linux kernel testing.)


    > > Aegis also gives you the ability to run the regression test suite
    > > against the baseline without a change set.  Due to Moore's Law, that 3
    > > CPU week test suite I mentioned would now run in about 40 hours.  With
    > > Aegis, you could have a cron job run the regression test suite every
    > > weekend.

That's where I think things tend to converge for these large
projects.   Again, if it were my call, I'd consider some more radical
process changes on these projects but it doesn't seem to be a
practical option.


    > > By using focused testing as described here, Aegis attempts to balance
    > > the need to commit in a timely fashion with the need to relentlessly
    > > test everything.  Thus test-during-commit is not the bottle-neck it may
    > > first appear to be.

Some very cool ideas which I hope we'll wind up integrating.


    > > 3: Code Reviews:

    > > By default, Aegis requires code reviews of all change sets before they
    > > are committed (actually, as part of the commit process).  My biggest
    > > concern is performing about 66 code reviews per day; humans scale worse
    > > than computers.

Yup.   In other threads and development activity we're working on
streamlining the review process (and parameterizing it so projects can
set their own policy) as far as we can.


    > > However, to more closely simulate the "look mum, no net" experience, you
    > > can configure your Aegis project skip the code review step.

    > > Somewhere in my filing cabinet are several papers which purport to show
    > > that code reviews are your biggest bug catcher in any process, and this
    > > certainly dovetails with my own experience.  So I prefer not to skip the
    > > code review step.  But Aegis will let you if you want.

For arch itself, we're currently working on moving to a system in
which a few co-maintainers can review any of a pool of 3rd party
patches.   Roughly speaking, an unambigous "two maintainers approve"
moves a contribution to the mainline, ambiguous review results raise
flags and call for human intervention, unambiguous rejects just reject.


    > > Of course, the "to enough eyes all bugs are shallow" mantra has the
    > > entire developer population doing the code review, but to my mind this
    > > is too late in the process for a commercial product, 

Agreed.   And you can substitute "commercial" with a more general
"critical (to someone)".

    > > particularly when
    > > the PHB is likely to pop into your cubicle with no notice with a
    > > prospective customer or venture capitalist in tow and say "demo, now". 
    > > OSS projects with anonymous CVS access are eerily similar.

I really hate the non-monotically-increasing-quality nature of the
free software mainlines but, for some projects, it's hard to achieve
that except on a scale of days rather than continuously.



    > > This is my major gripe with BK - it solves the wrong problem incredibly
    > > well.  It allows crap into the repository, and then spreads the crap to
    > > everyone really, really efficiently.  Surely propagating working code a
    > > little slower more desirable?

Does it seem as messed up to you as it does to me that all these
companies have come to rely on the linux kernel yet there's no formal
public testing infrastructure for it?

That's half the reason the GNU/Linux distribution vendors are in
business: because they fork the damn kernel and put up filters against
crap and provide local fixes.  A few process tweaks and those
companies are largely superfluous.  A few mil spent on testing hw and
software and a modest annual upkeep of a few dozen salaries and the
industry is completely reshaped.  It would be over-the-top to describe
these companies as "exploitative (of volunteer labor) snake oil,
oppressing the natural progress of these projects" so I'll be sure not
to say that.


    > > But see below for more of Tom's wisdom on this subject.


    > > 4: Tagging:

    > > With CVS, in order to recreate the source at a given point in
    > > time, 

[Tom does a spit-take at just that phrase.]

    > > you
    > > have to tag ALL source files.  This doesn't scale - a commit is O(n)
    > > where n is the number of files in the repository, not the number of
    > > files in the change set.

    > > Aegis assigns every change set commit a unique version number (usually
    > > called a "configuration identifier" or similar in the text
    > > books).  

(arch is similar in that regard)

    > > This version string (usually less than a dozen characters) 

(two or three dozen, here --- but then we're also imposing a global
(in the Earth sense) namespace)


    > > can be used at any
    > > future time to reproduce the source of that version.  And it scales - it
    > > doesn't need to tag every file.

    > > Of course, if you want your commits to take longer-and-longer as your
    > > project matures and grows, you can optionally tag every file at every
    > > commit, but it's not the default.


    > > Quoting Tom Lord:
    > > > Imagine yourself wanting to enact the [test before committing]
    > > > policy.   You make a GCC tree.  You add in your changes.  You want to
    > > > commit.   Before you commit, you run the tests.  The tests all pass,
    > > > fine -- but they took so long to run that now your tree is
    > > > out-of-date.

    > > THIS can be a problem.  For any sufficiently large/active project, for
    > > any CM system.

Yup.   The CVS-like systems that don't care about whole-trees have a
slight advantage:  if most of my tree is out of date but the part I'm 
actually committing is not, those systems are happy.   Of course, you
and I probably agree about how desirable that work-around is.

    > > And, as mentioned in the thread, a probabilistic approach is needed. 
    > > Unit test your change set.  Run 20 or 100 likely-to-be-relevant tests
    > > (the greatest number of relevant tests which will take less than 20
    > > minutes to run).  It isn't perfect, and it can't be, not without
    > > changing the process: slowing down the rate of commits, or using
    > > integration branches, or using pessimistic locking rather then
    > > optimistic locking (or no locking), or... etc.


So, perhaps a hybrid.   Some scattershot tests (time-bound) as a
commit barrier and then the asynchronous deeper testing behind that.
Certainly a possibility although, in the context of arch, I think our 
priority has to be on getting the asynchronous stuff working first.


-t
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Gnu-arch-users] GCC v. Arch address@hidden: Regressions on mainline], Tom Lord <=
Prev by Date: Re: [Gnu-arch-users] furth/arch for dummies
Next by Date: Re: [Gnu-arch-users] Re: arch roadmap 1 (and "what's tom up to")
Previous by thread: [Gnu-arch-users] (my) tla on windows roadmap
Next by thread: [Gnu-arch-users] The C-B-V-R part of a fully qualified name
Index(es):
- Date
- Thread