bug-dejagnu
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in


From: Pedro Alves
Subject: bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in test-case
Date: Sat, 27 Jun 2020 12:40:20 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

On 6/26/20 11:37 PM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/25/20 11:39 PM, Jacob Bachmeyer wrote:
>>  
>>> Pedro Alves wrote:
>>>    
>>>> On 6/25/20 3:43 AM, Jacob Bachmeyer wrote:
>>>>  
>>>>      
>>>>> Pedro Alves wrote:
>>>>>           
>>>>>> [...]
>>>>>>           
>>>>> I presume buildbot can be easily configured to pass a '--keep_going' 
>>>>> option, perhaps as 'RUNTESTFLAGS=--keep_going' if needed?
>>>>>             
>>>> What's more likely is we'll tweak GDB's Makefile so that make check
>>>> enables that option automatically.  If GCC does the same, which I would
>>>> encourage them to, since they also run the testsuite against systems
>>>> that can take hours/days to complete, then I'll get to wonder who the
>>>> default is for.  :-)
>>>>         
>>> The default is for developers working on testsuites -- and for testing 
>>> release candidates and releases because if your testcases crash in a 
>>> release, you have problems.  :-)
>>>
>>> Are you suggesting withdrawing the --keep-going/--no-keep-going options
>>>     
>>
>> Yes, but if you would like to add it, it's OK with me, though then
>> you can say that I'm more discussing the default.  But really I'm more
>> worried about making sure that the people maintaining the GDB
>> testsuite (me included), and the DejaGnu people understand each
>> other's needs better.
>>   
> 
> My position here stems from the need for reliable testing, and part of that 
> is that crashes in the testsuite scripts need to be made very obvious, where 
> DejaGnu had previously swept them under the proverbial rug.  Sure, there 
> would be a message in the log, but that is easily missed in a testsuite as 
> large as GDB's and nowhere was a need to grep the log for those messages 
> documented.

Sure, but we don't need to drastically change DejaGnu's behavior to address 
that.

> 
>>>>> Still, needing a new option to get old behavior is changing the 
>>>>> environment interface, so I will make --keep_going the default, at least 
>>>>> for the rest of 1.6.  That default is likely to change for 1.7.  For 
>>>>> automated systems like buildbot, passing '--keep_going' should be 
>>>>> perfectly reasonable, which is why the option exists.  (While the 
>>>>> documented options will be changed to use hyphens to align with the rest 
>>>>> of the GNU system in 1.7, the current option names will continue to be 
>>>>> accepted, since the aliases are obvious.)
>>>>>             
>>>> Currently with DejaGnu only the case of calling an unknown function stops
>>>> the test run AFAIK.  Any other tcl error like calling a proc with the wrong
>>>> number of arguments, or treating a non-array variable as an array, etc. is
>>>> caught by the catch in the runtest proc in runtest.exp, and the testrun
>>>> continues.  That to me clearly indicates that the original intention was
>>>> to catch errors and continue.
>>>>
>>>> That also gives the tool's $(tool)_finish proc a chance to tear down
>>>> correctly.
>>>>
>>>> It is just that the "unknown" case wasn't thought of.  So I argue that not
>>>> making unknown proc calls abort the run is a bug fix, and making DejaGnu
>>>> abort the run for other errors by default is a behavior change that no real
>>>> testsuite built around DejaGnu is asking for.
>>>>         
>>> As it was explained to me, it is the other way around -- the catch in 
>>> runtest was overlooked when ::unknown was added to catch bugs by aborting 
>>> if an undefined procedure is called.
>>>
>>> Is there a ${tool}_finish or do you mean ${tool}_exit?  Either way, you 
>>> make a good point about running cleanup procedures when aborting a test run.
>>>     
>>
>> Look at proc runtest in runtest.exp, it calls ${tool}_init, sources the 
>> testcase under catch (using ERROR:, so here you could
>> run a tally of such caught errors), and then calls ${tool}_finish.
>>   
> 
> Well, sure enough, there are ${tool}_init and ${tool}_finish procedures 
> called "around" each script if they exist... and no documentation... another 
> item added to my local TODO list for improving the manual.
> 
>>>>> The entire reason for stopping the testsuite immediately when a Tcl error 
>>>>> occurs is that it is undeniable that something is *very* *wrong* when the 
>>>>> testsuite stops early.  Software testing is hard enough when you know 
>>>>> your testsuite is good, but it is far worse when you have bugs in the 
>>>>> testsuite masking bugs in the actual program because a test script is 
>>>>> crashing and part of the program that is supposed to be tested is not 
>>>>> being exercised.

It's actually deniable that there's something *very* *wrong*.  For one, all 
tests
run under the same runtest process, and are sourced in the global namespace.
So it is easy for a global variable to leak from one testcase to another.  
That's
usually harmless, except when you have

 set foo 1

in testcase a.exp, and then b.exp does:

 set foo(1) 1

Whoops, TCL doesn't like that, so it aborts the testcase.

Could this be solved by putting each testcase in its own address space?
Yes.  Do people actually do that?  No -- it's a lot of make work and the
problem is rare.  Is that reuse global as array really revealing such a MAJOR
issue with the GDB testcase that it warrants stopping the whole run?  No...

I would argue that instead it is the DejaGnu framework that should make
it so that each testcase is run under as much isolation as possible.
So I call that a design issue in DejaGnu itself.  Whether that could
be fixed by something like sourcing each testcase under its own
Tcl slave interpreter, I'm not sure.

Another example of a Tcl bug that sometimes we trip on.  Say we have a
testcase that needs to extract some address from the inferior
in its setup phase, something like this:

 set test "get address"
 gdb_test_multiple "x /i \$pc" $test {
     -re "($hex).*$gdb_prompt $" {
         set addr $expect_out(1,string)
         pass $test
     }
 }
 
 if {$addr == 0} {
   # The above failed, so no use continuing.
   return
 }

Now, gdb_test_multiple is a wrapper around expect, that internally catches
GDB crashes, internal GDB errors, timeouts and other things, so the bug above
is often missed.  The bug is of course that when e.g., a timeout occurs, $addr
is left uninitialized, so you get a Tcl error referencing a nonexisting 
variable.

But the thing is, we already issued a FAIL, and we want to bail out of
the testcase anyway.  Crashing the whole testrun for a bug that typically
happens in already-failing code is too strict.  And these are the
code paths that are not exercised in a normal run, so are 
easy to miss, and are just impossible to have full coverage for.

>>>>>             
>>>> I think it's safe to say that we all understand the general principle,
>>>> but IMO that applies more to continuing the same testcase (if somehow that
>>>> were possible, like "unknown" suppressing the error), rather than 
>>>> continuing
>>>> with a different testcase.
>>>>
>>>> Continuing the testrun when one testcase errors out does not mask any
>>>> bug in the program that is supposed to be tested.  Why would it?  The
>>>> testcase aborts, it doesn't continue.  The remaining testcases start
>>>> afresh.
>>>>         
>>> Consider a hypothetical case where GDB has exec.exp (which tests starting a 
>>> debugged process) and attach.exp (which tests attaching to a process).  
>>> Unknown to the GDB developers, the "attach" command does not work and 
>>> causes GDB to dump core (oops!) but attach.exp fails with a Tcl error 
>>> before reaching the point where "attach" is issued to the inferior GDB.  If 
>>> DejaGnu stops immediately, it is obvious that the testsuite is broken, the 
>>> bug in the testsuite is fixed, and the bug in GDB is found.  If DejaGnu 
>>> sweeps the Tcl error under the proverbial rug (as previous versions did), 
>>> the bug in the testsuite effectively hides the bug in GDB , because the 
>>> developers think the "attach" command is being exercised, but those tests 
>>> are being (almost) silently skipped.
>>>     
>>
>> First, that scenario doesn't normally cause a Tcl error.  If GDB trips
>> an internal error, the GDB testsuite catches it with a FAIL.  If GDB just 
>> dies
>> (whether it dumps core or not) the GDB testsuite catches that
>> with UNRESOLVED.
>>   
> 
> In the hypothetical scenario, attach.exp crashes with a Tcl error *before* 
> issuing the "attach" command to the GDB-under-test.  Previous releases of 
> DejaGnu mention a Tcl error in the log and carry on.  With this patch, 
> DejaGnu catches that attach.exp crashed and itself inserts an UNRESOLVED 
> result.
> 
> I also take this as more support for using the UNRESOLVED status here:  the 
> test script crashing is little different from GDB exiting unexpectedly.

Well, I'm actually not sure why doesn't GDB FAIL in the crash case instead of
UNRESOLVED.  To me a GDB internal error or the GDB process crashing doesn't
look very different.  Both require pretty much the same level of attention
from a developer.  But that decision predates me.  Who knows, maybe GDB
internal error support was only added after the testsuite was originally
written, and thus the crash handling predates the internal error handling.

> 
>> Also, even if those weren't already considered, the developer won't think the
>> testcase is just passing correctly, because being a good GDB developer,
>> he knows he needs to diff the new gdb.sum file against a previous run's, and
>> sees that some PASSes are missing, and some ERROR: tcl error showed up
>> in their stead.  Nobody should rely on just looking at the summary.
>> Everyone _must_ diff between a previous run and the current run.  Now some
>> people may use test result diffing scripts that forget to look
>> at ERROR: lines in .sum.  Those would be buggy, but then again, if
>> those ERRORs were accounted for in the DejaGnu summary, they would be
>> harder to miss.
>>   
> 
> If this is the case, then what is the problem with aborting the test run upon 
> error?  "Good GDB developers" will always diff the .sum files, notice that 
> the expected new PASS results are not there, and fix their code before 
> checking it in, so they will never have a problem with DejaGnu aborting upon 
> encountering a Tcl error.  :-)

As I've explained, the problem is that one testcase prevented a large number
of others from running, potentially substantially all of the testsuite.  So 
you've
now wasted a lot of time and maybe prevented other people from using the
target system meanwhile.  All because of needless interference between
testcases.

> 
>> And again, people run the GDB testsuite in parallel mode, so your
>> assumption that the testsuite would stop immediately anyway doesn't
>> hold.  One of the parallel runtest invocations bombs
>> out, while the others keep running, so by the time you get
>> back the prompt, other testcases already executed, and you don't
>> see the ERROR in the current terminal anymore.
> 
> This is a fundamental problem with slicing up the testsuite external to 
> DejaGnu and running multiple instances of runtest -- output that is supposed 
> to be "last at the end of the run" can be hidden away by other ongoing subset 
> runs.

That's like arguing against parallelization, and saying that everyone
should run the testsuite in serial mode, and be OK with it taking 2 hours
instead of 30 mins to complete a run...

The parallel mode aggregates the summaries from the different runtest
invocations as a single summary at the end,

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=contrib/dg-extract-results.py;h=30aa68771d491451f72d6dbd18f6d5ac339451b5;hb=HEAD

so I don't see what problem here is unsolvable.  Again, the issue is that Tcl
ERRORs aren't tallied up on the DejaGnu summary, so the aggregated summary
doesn't contain those either.

> 
>>   You'll need to
>> look at the testsuite .sum .log files anyway.  Only, you will
>> find out that one testcase bombed out, 50000 tests ran, but 1000 others
>> didn't run because of that one testcase.  And now you're missing
>> those 1000 results, which maybe included those which you were
>> actually interested in, because you were doing a builbot try run
>> across a number of systems.
>>   
> 
> This also applies on a smaller scale for which we have no workaround:  
> gdb.foo/bar.exp bombs out and the remaining /N/ tests in that file did not 
> run.  The only real solution is for the test scripts to avoid causing Tcl 
> errors in the first place; anything else is papering over the problem with 
> duct tape.

That's not the same thing.  The remaining /N/ tests in the file depend
on the previous /M/ tests.  E.g.,

 gdb_test "step"
 gdb_test "print foo"

If the "step" fails, the "print foo" test is meaningless.

However, there should be absolutely no dependency between foo.exp and
bar.exp.  It should be possible to run those two different testcases
in any order.  Thus, it should be irrelevant to foo.exp whether bar.exp
bombs out of not.  Consequently, there's no real reason to not let
foo.exp run regardless of what happens to bar.exp.

> 
>> Now, when running the testsuite against slow embedded boards, then
>> you won't normally be using the parallel mode, the board just can't
>> support multiple parallel connections.  In that case, you set the
>> testsuite running, and go on your merry way for a couple hours or 20.
>> When you come back, maybe the next day, you notice that the testsuite
>> actually stopped running 30 mins in, because someone changed
>> the testsuite upstream just before you rebased in way that caused a
>> tcl error for you in some testcase.  Oops.  You just lost a day.
> 
> This is more of an annoyance, and yes, I have come back the next day to find 
> that a long-running command failed shortly after being started, so I know 
> this irritation.

There you go.

> 
>>   Now
>> you wish you had remembered to type --keep-going, but it's not
>> the default, and nobody ever writes Tcl bugs, so you didn't think
>> of it before.
>>   
> 
> Note that the snapshot of GDB I have been using for testing this patch does 
> have such a Tcl bug in its testsuite.  I did not make any particular effort 
> to choose this snapshot, it was just the current state of GDB master when I 
> started.  I have not even looked at *which* commit it is. 

Can you tell me what the error looks like?

> Thinking "nobody ever writes Tcl bugs" is what causes these problems in the 
> first place.

I don't know what you mean by this.  What I meant is that people likely won't
remember to use --keep-going when invoking runtest interactively.  Why force
people to go through the pain of learning the option?

> 
> I would expect that, had DejaGnu always aborted immediately upon catching Tcl 
> errors from testsuites, that issue would have been found and fixed quickly.

As I've explained, that "aborted immediately" is just impossible by design with
parallel mode.  You could approximate it perhaps, by making make abort all the
parallel runtests when one of the runtests bombs out.  It won't be immediately.
And I'd of course think that adding such feature wouldn't help anyone in 
reality.

And with your "quickly", you're assuming that a Tcl errors typically happen in 
the
normal code paths.  If it happens on some timeout path as I explained, then it 
may be
that it's not that easy to trigger the error, unless you happen to run the
testsuite on a "lucky" system with a "wrong" combination of environment
(compiler, glibc, kernel, etc.).

Thanks,
Pedro Alves






reply via email to

[Prev in Thread] Current Thread [Next in Thread]