parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Revision of GNU Parallel's processing of SIGTERM


From: Martin d'Anjou
Subject: Re: Revision of GNU Parallel's processing of SIGTERM
Date: Tue, 14 Apr 2015 00:30:31 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 15-04-13 03:46 PM, Ole Tange wrote:
On Mon, Apr 13, 2015 at 3:44 PM, Martin d'Anjou
<address@hidden> wrote:
On 15-04-12 07:14 AM, Ole Tange wrote:
:
I propose two solutions:
- Change the code to only kill the parent process (existing users will have
to change their application code to kill children processes themselves)
- Do not change the default behaviour, but add an option so that only the
parent job is killed (existing users see no change)
The more I think about it the more I like the:

'kill TERM', wait, 'kill TERM', wait, 'kill KILL', 'kill KILL
@grandchildren_pid'

When will that fail?

To make this work, GNU parallel would have to wait for the whole process tree, not just the first children. Today it waits only for family_pids[0] I think. So this change may affect existing users.

Also, it will fail for the user when the delay is not long enough or when the software wants INT instead of TERM. I am also thinking... what if a user is stuck with a bad parent process (one that does not propagate TERM) but a good grand-child process? The grand-child of a "crappy coded" parent process will only get kill -9, which might not be what the user wants. Maybe the user would like SIGTERM to get to the grand-child. It's certainly not easy to find a way that works for all scenarios. There could be scenarios that I cannot imagine. This does not mean they do not exist.

Perhaps a way to offer a generalized solution is to have an option to "broadcast" the termination sequence to the entire process tree once the initial attempt has failed, it could be called --broadcast-term-seq.

But I have a question: do you want to keep the current behaviour intact for the existing users, or do you want to change it and force the users to come along with the change?

GNU Parallel by design tries to do The Right Thing and not assume the
world is perfect. In a perfect world the parent will kill its
children. But if the parent is crappy code (which might be proprietary
or impossible to fix for other reasons), then the parent might not do
so. And would it then not be The Right Thing to kill -9 the
(grand*)children after we kill -9 the parent?

I am okay with hard coding any termination sequence inside GNU parallel, as long as the user has an option to change it in the case of an unforeseen scenario, it's like giving more freedom to the user.

The only situation I can think of where this behaviour is wrong, is in
the weird situation where the parent wants the children to live on,
without explicitly daemonizing the child. And I cannot come up with a
real life scenario where that will ever happen. Can you?

Users have surprised me in the past, and will surprise me in the future. I try to write programs that stay open, and only close the opportunities when necessary.

I would like to propose the following:

1) Add an option (--term-seq TERM_SEQ) to specify the termination sequence in the form of a comma separated list of signals and delays 2) Add an option (--propagate-term-seq) to cause the termination sequence to be propagated to the immediate child processes after the first kill -SIGTERM (but do not send to the process tree) 3) When a timeout occurs and --term-seq is used, the timeout uses that termination sequence, otherwise it uses the built-in termination sequence 4) Add an option (--broadcase-term-seq) to broadcast the termination sequence to the entire process tree, this happens only after the usual termination sequence has been used and if there are remaining processes.

The default termination sequence remains the same:
TERM,200ms,TERM,200ms,KILL

And unless you want to change the existing method to apply the termination sequence: The existing termination sequence application method to broadcast the signals to the process tree, and to wait for the immediate children, remains the same. If the user specifies any of the new options, the new method kicks in: apply the termination sequence to the immediate children first, broadcast after if there are remaining processes (this means wait for the entire process tree, not just the immediate children).

Mediate on this I will. It is a lot to absorb.

Cheers,
Martin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]