parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Alternate termination sequence option --term-seq


From: Martin d'Anjou
Subject: Re: Alternate termination sequence option --term-seq
Date: Wed, 29 Apr 2015 20:36:13 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 15-04-29 03:32 PM, Ole Tange wrote:
On Wed, Apr 29, 2015 at 2:07 PM, Rasmus Villemoes <address@hidden> wrote:
On Wed, Apr 29 2015, Ole Tange <address@hidden> wrote:

This still has the risk of killing an innocent PID and its children.
Killing (in the sense of sending any signal whatsoever) an
innocent/unrelated PID is completely unacceptable, IMO. On a reasonably
busy system, PID reuse within 10 seconds is far from unlikely.

I agree. It is wrong to try to kill a PID that was not spawned from the same process lead, esp if it succeeds. Can this ever succeed?

My reference on the matter is: http://mywiki.wooledge.org/ProcessManagement

On my system this gives PID reuse after 3.1 secs, but that is a very
extreme case, and I will accept if GNU Parallel deals wrongly with that case:

perl -e 'while(1) { $a=(fork()|| exit); if(not $a %1000) {print "$a\n";}  } '

Mapping
the tree even before signalling the immediate children is not enough;
some of the grand^nchildren may vanish in the meantime and their PIDs
reused before one can use the gathered information.
I doubt that is true in practice. Mapping takes less than 100 ms, so I
would find it very unlikely that the PID will be reused that fast. I
understand that this could in theory happen, but I would like to see
this demonstrated before I consider this a real problem.

Since GNU Parallel will be sleeping (and not doing anything else) we
could simply kill 0 all the (grand*)children every second and compute
the family tree of the current children. If the child dies, remove the
child from the list to be killed later.

@children=familiy_tree(@job_pids);
for $signal (@the_signals) {
   kill $signal, @job_pids;
   $sleep_time = shift @sleep_times;
   $time_slept = 0;
   while($time_slept < $sleep_time and @children) {
     @children = family_tree(grep { kill( 0, $_) } @children);
     sleep $a_while;
     $time_slept += $a_while;
   }
}
kill KILL, @children;

Rasmus: Can you find a situation in which the above will fail?

I would like to answer too:
Killing the process tree is not an atomic operation, so I am sure anything can happen with some amount of effort on the part of the user.

Also, we do not know if the kill loop kills from parent to child, or from child to parent. It matters: kill the child first, and the parent thinks something went wrong with the child, but nothing was wrong with the child, instead something was wrong higher up above: a termination condition was met. Kill the parent first, then the child, but now the parent tries to run its signal trap and realizes the child is also gone. What report does the parent write? Does it write that the child went missing? I feel it gets complicated for reporting.


I think the only way to do this right is for GNU Parallel to make each
immediate child a process group leader (setpgrp 0,0 immediately after
fork).
GNU Parallel uses open3 to spawn children. According to strace -ff
that does not do a setpgrp.

Do note that one can never clean up all descendants that may have been
spawned: A dance consisting of double fork() and some setpgid/setsid
yoga will create a process which cannot be tied to GNU Parallel or any
of its immediate children. So one has to rely on the children not doing
such things.
Yes. GNU Parallel should do the right thing in most cases and not
cause a problem in the rest.

To me the "right thing" is to give a chance for well-written programs, programs that obey all the rules about cleaning up their child processes, a chance to execute without causing in them spurious errors like "kill (...) - No such process". Send the signals to the immediate child first, then after a delay, if that has failed, do the process group kill loop. To those who care, it is a huge improvement, and to those who do not care, they won't care one way or the other.

Cheers,
Martin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]