dejagnu
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PID-reuse races fix, introduced GCC validation brakage


From: Yvan Roux
Subject: Re: PID-reuse races fix, introduced GCC validation brakage
Date: Thu, 31 Mar 2016 17:25:20 +0200

Hi Pedro,

On 31 March 2016 at 16:20, Pedro Alves <address@hidden> wrote:
> On 03/30/2016 04:03 PM, Yvan Roux wrote:
>> Hi,
>>
>> https://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html
>>
>> this patch introduced  a new failure related to GDB testing, but this
>> time in GCC guality part of the testsuite.  When
>> gcc/testsuite/gcc.dg/guality/example.c is executed we have this set of
>> processes started
>>
>> PID  PPID     command
>> 100  99          ./example.exe
>> 101  99          cat
>> 102  100        sh -c gdb -nx -nw --quiet > /dev/null 2>&1 ./example.exe
>> 103  102        gdb -nx -nw --quiet ./example.exe
>>
>> The issue is that when close_wait_program is called pid="100 101" and
>> when the "wait" returns none of these 4 processes is actually killed,
>> and the killing of the "kill pipeline" made these pids untouched and
>> the validation hangs forever.
>
> The questions I'm not seeing answered are:
>
>  - How did this work before?

We just discovered this issue in a new target we added to our
validation (native armv8l), and on this target this guality test is in
timeout (which triggers the call to close_and_wait routine).

>  - What exactly in the patch causes the regression?

The issue is not related to pid being used as a number instead of a
list (some cleanup are needed, but you're right this is here for a
very long time and works I just discovered during my debugging session
that it can be a list).

The regression is due to this part:

+    # Reap it.
+    set res [catch "wait -i $program_id" wres]
+    if {$exec_pid != -1} {
+       # We reaped the process, so cancel the pending force-kills, as
+       # otherwise if the PID is reused for some other unrelated
+       # process, we'd kill the wrong process.
+       exec sh -c "exec > /dev/null 2>&1 && kill -9 $exec_pid"
+    }

In our case the "wait -i" command ends without any effect on the
running processes and then it kills the command which was about to
kill them, and the validation is stuck.

> AFAICS, before my patch we had:
>
> - if { $pid > 0 } {
> -    verbose "doing kill, pid is $pid"
> -    # This is very, very nasty. SH, instead of EXPECT, is used
> -    # to run this in the background since, on older CYGWINs, a
> -    # strange file I/O error occures.
> -    set pgid "-[join $pid { -}]"
>
> while my patch changed it to:
>
> +    if { $pid > 0 } {
> +       # Tcl has no kill primitive, so we have to execute an external
> +       # command in order to kill the process.
> +       verbose "doing kill, pid is $pid"
> +       # Prepend "-" to generate the "process group ID" needed by
> +       # kill.
> +       set pgid "-$pid"
>
> I probably did that change just assuming pid is a number, given
> the pid > 0 check.
>
> So if you replace:
>
>  + set pgid "-$pid"
>
> with:
>
>  - set pgid "-[join $pid { -}]"
>
> I imagine it would get things back to how they were for you?
>
> Thanks,
> Pedro Alves
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]