bug-dejagnu
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in


From: Pedro Alves
Subject: bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in test-case
Date: Thu, 25 Jun 2020 17:01:10 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

On 6/25/20 3:43 AM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/24/20 12:36 AM, Jacob Bachmeyer wrote:
>>  
>>> Pedro Alves wrote:
>>>    
>>>> On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
>>>>       I'd think that tracking ERROR and WARNING in the summary would sort
>>>> out this "test run blew up but no one noticed" issue.
>>>>       
>>> The problem is that the DejaGnu internals often generate ERROR and WARNING 
>>> "directly" without using the procedures that update counters.      
>>
>> I understand that that's what DejaGnu does currently, but couldn't it be 
>> changed?
>> Would it not be desirable?
>>   
> 
> It could be changed, but those are much larger changes to the code and 
> overall behavior than simply adding an UNRESOLVED result to report the 
> failure, and if I understand correctly, the presence of UNRESOLVED results 
> marks the test results as invalid or partially-valid at best, which is 
> correct if a test script failed with a Tcl error.
> 
> Also, the current warning and error counters are not global, but are reset 
> after each test result.  Exceeding thresholds for warnings or errors causes 
> the next test result to be changed to UNRESOLVED before the counters are 
> reset.
> 
>>> I am currently considering also adding a 
>>> '--no-keep-going'/'--no_keep_going' option and making '--keep-going' the 
>>> default for the rest of the 1.6 series.  Aborting on uncaught Tcl errors 
>>> will be the default in 1.7, but I am trying to decide whether changing it 
>>> in 1.6 counts more as fixing a bug (DejaGnu quietly sweeping Tcl errors 
>>> under the proverbial rug) or introducing an environment change (needing 
>>> '--keep-going' to reach the end of some testsuites that "worked" before).  
>>> This is further complicated by the fact that any testsuite needing that 
>>> option to reach the end means that, technically, that testsuite is invalid 
>>> -- it *did* throw a Tcl error during testing.
>>>     
>> It seems your designing this under the assumption that if a testsuite
>> run "blows up", it's not a big deal to just fix it and rerun the testsuite.
>>   
> 
> For most packages, this is correct.  For GDB, perhaps it is a bigger problem.
> 
> On the other hand, "blowing up" at least indicates clearly that something is 
> wrong and prevents "whistling past the graveyard" where tests are not 
> actually running, but everyone thinks the situation is fine.
> 
>> What that misses is that in some environments, particularly with slow 
>> embedded
>> system boards, running the full testsuite can take _days_.   This is all
>> compounded by the fact that the testsuite must be run in a large set of
>> different modes and boards to cover everything for a single target.  Even on
>> a modern GNU/Linux system, running all combinations of modes for full 
>> coverage
>> can take hours.
>>
>> Having the testrun stop midway because of some specific testcase
>> blowing up can waste a lot of time.
>>
>> The fact that the testsuite can take a while to fully run results in 
>> automated
>> testing systems, like GDB's buildbot, struggling to keep up with testing
>> all the commits that land in git.  On some buildbot setups, it's just not
>> possible to test all commits.  So if some testcase blows up the whole
>> test run, the next time you run the testsuite, it'll likely be that a
>> large number of unrelated commits went into the sources.  Losing a large
>> chunk of the testsuite results because of some small typo in one
>> testcase is thus undesirable, because noticing the typo in the first
>> place can take a while, and also you may be stuck with a "last good run"
>> for many of the testcases that gets older and older until
>> the typo is finally fixed.
>>   
> 
> I presume buildbot can be easily configured to pass a '--keep_going' option, 
> perhaps as 'RUNTESTFLAGS=--keep_going' if needed?
> 

What's more likely is we'll tweak GDB's Makefile so that make check
enables that option automatically.  If GCC does the same, which I would
encourage them to, since they also run the testsuite against systems
that can take hours/days to complete, then I'll get to wonder who the
default is for.  :-)

We'll need to somehow detect that the option is supported, though, so
it's of course doable, just will require a configure check or some little
bit of shell to probe the option or something of the sort.

> Still, needing a new option to get old behavior is changing the environment 
> interface, so I will make --keep_going the default, at least for the rest of 
> 1.6.  That default is likely to change for 1.7.  For automated systems like 
> buildbot, passing '--keep_going' should be perfectly reasonable, which is why 
> the option exists.  (While the documented options will be changed to use 
> hyphens to align with the rest of the GNU system in 1.7, the current option 
> names will continue to be accepted, since the aliases are obvious.)
> 

Currently with DejaGnu only the case of calling an unknown function stops
the test run AFAIK.  Any other tcl error like calling a proc with the wrong
number of arguments, or treating a non-array variable as an array, etc. is
caught by the catch in the runtest proc in runtest.exp, and the testrun
continues.  That to me clearly indicates that the original intention was
to catch errors and continue.

That also gives the tool's $(tool)_finish proc a chance to tear down
correctly.

It is just that the "unknown" case wasn't thought of.  So I argue that not
making unknown proc calls abort the run is a bug fix, and making DejaGnu
abort the run for other errors by default is a behavior change that no real
testsuite built around DejaGnu is asking for.

> This has been pushed on the temporary PR41918 branch because the fix for 
> PR41824 has been rolled into the fix for PR41918, as the bugs are closely 
> related.
> 
>> Then there's the issue about testsuite order.  DejaGnu runs testcases
>> in alphabetical filename order, in sequence.  If the testrun blows up in
>> the first testcase run, say, a.exp, the whole testrun stops before b.exp,
>> etc. are run.  If the testcase that blows up is the last one, say named
>> zzz.exp, most of the testsuite runs.  This is not "fair".  If I rename
>> a.exp to zzzzz.exp, then suddenly that testcase gains more "importance" in
>> the sense that it can no longer blow up the whole test run.  A
>> testcase's name should not be significant like that.
>>
>> On the same vein, consider parallel mode.  This is not native to DejaGnu,
>> but both GDB and GCC have parallel modes, where their makefiles set up a
>> make check such that they spawn a number of runtest processes running in
>> parallel, each exercising a subset the whole testsuite.  Imagine a testsuite
>> composed of exactly 32 testcases.  And further imagine that you run it
>> under "make check -j32".  That is, you'll end up with 32 runtest instances,
>> each running exactly one testcase.  Now, if one of the testcases blows up, 
>> all
>> the other 31 will still run.  Again, this shows that testcase name, and
>> testrun order should not be important.  There is no real reason that a
>> sequential run in that scenario ends up with different results (testcase 
>> a.exp
>> blew up, so no result for the other tests) compared to a parallel
>> run (all tests ran successfully but test a.exp).  Instead, I advocate
>> that the testresults summary should indicate that a problem happened,
>> so that "blow up event" isn't lost.  But do let the whole testsuite run.
>>   
> 
> There are plans to eventually implement a native parallel testing mode.  You 
> are correct that the order in which testcases are run should be 
> insignificant, and for valid testsuites that do not crash, it is 
> insignificant -- all tests are run.
> The problems start when a testcase *crashes* and aborts with a Tcl error.  

It's hopefully obvious that I'm aware of the latter.

> 
> The entire reason for stopping the testsuite immediately when a Tcl error 
> occurs is that it is undeniable that something is *very* *wrong* when the 
> testsuite stops early.  Software testing is hard enough when you know your 
> testsuite is good, but it is far worse when you have bugs in the testsuite 
> masking bugs in the actual program because a test script is crashing and part 
> of the program that is supposed to be tested is not being exercised.

I think it's safe to say that we all understand the general principle,
but IMO that applies more to continuing the same testcase (if somehow that
were possible, like "unknown" suppressing the error), rather than continuing
with a different testcase.

Continuing the testrun when one testcase errors out does not mask any
bug in the program that is supposed to be tested.  Why would it?  The
testcase aborts, it doesn't continue.  The remaining testcases start
afresh.

Anyway, given the option, it won't be bad, just a little annoying.

Thanks,
Pedro Alves






reply via email to

[Prev in Thread] Current Thread [Next in Thread]