bug#47795: utils.test fails intermittently

bug-dejagnu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#47795: utils.test fails intermittently

From:	Jacob Bachmeyer
Subject:	bug#47795: utils.test fails intermittently
Date:	Thu, 29 Apr 2021 22:15:53 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0

Rainer Orth wrote:

I just noticed that one part of the runtest testsuite fails
intermittently on both Solaris 10 and 11 on the 1.6.3 branch.  It's easy
to miss because it only manifests itself in runtest.{sum,log} as

[...]
PASS: testsuite file creates new implied object directory
ERROR: 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
 did not complete

and

                === runtest Summary ===

-# of expected passes 268

+# of expected passes           294

Unless you know how many passes to expect, you likely miss the problem.

This is a long-standing issue with DejaGnu: the framework does not knowhow many tests to expect, so missing tests just go, well, /missing/.

This is an architectural limitation in DejaGnu, so the closest to anactual solution while preserving backwards compatibility is to improveerror handling to more reliably report that /something/ went wrong, evenif we have no way to state how many tests were not run. Optional "testmanifest" features, allowing testsuites to provide the framework with alist of expected tests, are on my local TODO list, but I have not yetfound a good way to generalize them to DejaGnu proper. It is possiblethat some other code consequent to the test grouping features that willaccompany the new XML output format in 1.7 will pave the way to a cleanAPI for this, but I am not yet sure of that.

Even when running runtest -v -v instead, neither the runtest output

@@ -157,8 +157,7 @@
 Processing test 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/testsuite_file.test
 Running 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
 ...
 Processing test 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
-WARNED: diff1.txt doesn't exist
-
+ERROR: 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
 did not complete

nor runtest.log show much:

Running 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
 ...
Processing test 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test

interp delete test_case
interp create test_case
interp create test_case
expect1.30> test_case eval {foreach { n v }}  [list [array get default_vars]] 
{{ set $n $v }}
test_case eval {foreach { n v }}  [list [array get default_vars]] {{ set $n $v 
}}
test_case^M^M
expect1.31> test_case eval source 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
test_case eval source 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
expect1.32> ERROR: 
/vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test 
did not complete

The test does succeed when running

runtest --tool runtest

in a row about 2 out of 10 times.

There is a possible clue here: it appears that we may be losingsynchronization: the "interp create test_case" line should be emittedin response to an expect1.*> prompt which seems to be missing in yourlog, yet the preceding "interp delete test_case" is included. Do Iguess correctly that the line preceding that snippet was another "interpdelete test_case"? Particularly interesting in that your log seems toshow "test_case" returned for the "test_case eval ... foreach ..." line,but that line should not produce a response -- the "test_case" responseshould be from the previous "interp create test_case" command.

In fact, examining process_test in testsuite/runtest.libs/libs.exp whichdrives these tests, the error that you are seeing is exactly what isproduced when an Expect prompt arrives when the driver is expecting tosee test results. Losing synchronization could cause this, althoughthat check was added to detect the case where the slave interpreteraborts due to a Tcl error in the inner test script.

Assuming the "lost sync" hypothesis, when the failure occurs, theprevious command is still "in-flight" somewhere... and the "1.32" promptis when the "test_case eval source ..." command is given in my logs...but your logs show it being issued at the "1.31" prompt... is an"expect1./N/>" prompt duplicated somewhere earlier in the log?

It is also possible that an earlier "expect" (command in process_test)is timing out while waiting for an "expect*>" prompt from the childExpect. My development box is relatively lightly loaded; what are thetypical load averages on the system where the problem occurs?

The code in testsuite/runtest.libs/libs.exp is mostly my doing at thispoint, and is probably some of the earlier Expect code that I havewritten. If there is a bug in there that only affects heavily-loadedsystems, it is probably my then-inexperience showing. :-)

I'm a bit at a loss how to investigate this further.

I have only now found a possible explanation, after sitting on this forabout two weeks.



-- Jacob

[Prev in Thread]

Current Thread

[Next in Thread]

bug#47795: utils.test fails intermittently, Rainer Orth, 2021/04/15
- bug#47795: utils.test fails intermittently, Jacob Bachmeyer <=

Prev by Date: bug#47382: runtest doesn't work with Solaris 10 /bin/sh
Previous by thread: bug#47795: utils.test fails intermittently
Index(es):
- Date
- Thread