bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10430: coreutils-8.14.116-1e18d: "make distcheck" failure on Debian


From: Pádraig Brady
Subject: bug#10430: coreutils-8.14.116-1e18d: "make distcheck" failure on Debian (one test failed)
Date: Wed, 04 Jan 2012 20:07:25 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0

On 01/04/2012 06:31 PM, Stefano Lattarini wrote:
> On 01/04/2012 05:35 PM, Pádraig Brady wrote:
>> On 01/04/2012 02:12 PM, Stefano Lattarini wrote:
>>
>>> The only failed test was `misc/timeout-group'.
>>
>> This is either a race in the test or a bug in timeout, neither of which I 
>> can see.
>> Your system is running 2.6.30-2-686 SMP
>>
>> Does this fail all the time?
>> (cd tests && make check TESTS=misc/timeout-group VERBOSE=yes)
>>
> No, it only fails ~ 6% of the time:

>> If you change timeout.c to fprintf(stderr) that the first
>> send_sig (monitored_pid) call is made, does that happen in the failing case?
>>
> OK, so, in 'timeout.c:cleanup()' I've added this line:
> 
>    fprintf (stderr, "^^^ send_sig (%lu, %u)\n", monitored_pid, sig);
> 
> just before this line:
> 
>    send_sig (monitored_pid, sig);
> 
> The logs of a failing and a passing test run after this modification are
> attached.

Great thanks.
The fprintf (stderr) should be unbuffered, and display before the signal is 
sent.
I still don't see exactly what's going on though :(

In the non working case we have:

+ env kill -INT -- -4854
+ wait
^^^ send_sig (4856, 2)
+ test -e int.received
+ rm -f int.received timeout.running

Notice that the signal was reported as sent.
Now if the signal wasn't in fact propagated,
the script would wait 20s and return without touching the file,
and hence cause the test to fail.

However I don't think that's what's happening, as the second
part of the test completes in less than a second.
I.E. The signal is reported as sent by the first timeout in the chain,
but no others report as receiving the signal, but then wait returns immediately,
suggesting there may also be an issue with the wait system call.

I'm also confused by the log of the working run.
The first "send_sig" is not reported, even though
the test seems to complete as expected?

I've put various sleeps locally here trying to trigger any races but can't.

For my reference, for me to try if I get access to such a system:
What could cause the breakdown in signal propagation is if
sometimes on exec(), the kernel set the signal handlers to SIG_IGN rather
than SIG_DFL (in contradiction to POSIX). One could test that hypothesis by
explicitly setting the signals listed in install_signal_handlers()
to SIG_DFL just before the execvp().

cheers,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]