On 6/24/20 12:36 AM, Jacob Bachmeyer wrote:
Pedro Alves wrote:
On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
I'd think that tracking ERROR and WARNING in the summary would sort
out this "test run blew up but no one noticed" issue.
The problem is that the DejaGnu internals often generate ERROR and WARNING "directly" without using the procedures that update counters.
I understand that that's what DejaGnu does currently, but couldn't it be
changed?
Would it not be desirable?
I am currently considering also adding a '--no-keep-going'/'--no_keep_going' option and
making '--keep-going' the default for the rest of the 1.6 series. Aborting on uncaught
Tcl errors will be the default in 1.7, but I am trying to decide whether changing it in
1.6 counts more as fixing a bug (DejaGnu quietly sweeping Tcl errors under the proverbial
rug) or introducing an environment change (needing '--keep-going' to reach the end of
some testsuites that "worked" before). This is further complicated by the fact
that any testsuite needing that option to reach the end means that, technically, that
testsuite is invalid -- it *did* throw a Tcl error during testing.
It seems your designing this under the assumption that if a testsuite
run "blows up", it's not a big deal to just fix it and rerun the testsuite.
What that misses is that in some environments, particularly with slow embedded
system boards, running the full testsuite can take _days_. This is all
compounded by the fact that the testsuite must be run in a large set of
different modes and boards to cover everything for a single target. Even on
a modern GNU/Linux system, running all combinations of modes for full coverage
can take hours.
Having the testrun stop midway because of some specific testcase
blowing up can waste a lot of time.
The fact that the testsuite can take a while to fully run results in automated
testing systems, like GDB's buildbot, struggling to keep up with testing
all the commits that land in git. On some buildbot setups, it's just not
possible to test all commits. So if some testcase blows up the whole
test run, the next time you run the testsuite, it'll likely be that a
large number of unrelated commits went into the sources. Losing a large
chunk of the testsuite results because of some small typo in one
testcase is thus undesirable, because noticing the typo in the first
place can take a while, and also you may be stuck with a "last good run"
for many of the testcases that gets older and older until
the typo is finally fixed.
Then there's the issue about testsuite order. DejaGnu runs testcases
in alphabetical filename order, in sequence. If the testrun blows up in
the first testcase run, say, a.exp, the whole testrun stops before b.exp,
etc. are run. If the testcase that blows up is the last one, say named
zzz.exp, most of the testsuite runs. This is not "fair". If I rename
a.exp to zzzzz.exp, then suddenly that testcase gains more "importance" in
the sense that it can no longer blow up the whole test run. A
testcase's name should not be significant like that.
On the same vein, consider parallel mode. This is not native to DejaGnu,
but both GDB and GCC have parallel modes, where their makefiles set up a
make check such that they spawn a number of runtest processes running in
parallel, each exercising a subset the whole testsuite. Imagine a testsuite
composed of exactly 32 testcases. And further imagine that you run it
under "make check -j32". That is, you'll end up with 32 runtest instances,
each running exactly one testcase. Now, if one of the testcases blows up, all
the other 31 will still run. Again, this shows that testcase name, and
testrun order should not be important. There is no real reason that a
sequential run in that scenario ends up with different results (testcase a.exp
blew up, so no result for the other tests) compared to a parallel
run (all tests ran successfully but test a.exp). Instead, I advocate
that the testresults summary should indicate that a problem happened,
so that "blow up event" isn't lost. But do let the whole testsuite run.