Re: Revision of GNU Parallel's processing of SIGTERM

parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Revision of GNU Parallel's processing of SIGTERM

From:	Martin d'Anjou
Subject:	Re: Revision of GNU Parallel's processing of SIGTERM
Date:	Tue, 14 Apr 2015 00:30:31 -0400
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

On 15-04-13 03:46 PM, Ole Tange wrote:

On Mon, Apr 13, 2015 at 3:44 PM, Martin d'Anjou
<martin.danjou14@gmail.com> wrote:

On 15-04-12 07:14 AM, Ole Tange wrote:

I propose two solutions:
- Change the code to only kill the parent process (existing users will have
to change their application code to kill children processes themselves)
- Do not change the default behaviour, but add an option so that only the
parent job is killed (existing users see no change)

The more I think about it the more I like the:

'kill TERM', wait, 'kill TERM', wait, 'kill KILL', 'kill KILL
@grandchildren_pid'

When will that fail?

To make this work, GNU parallel would have to wait for the whole processtree, not just the first children. Today it waits only forfamily_pids[0] I think. So this change may affect existing users.

Also, it will fail for the user when the delay is not long enough orwhen the software wants INT instead of TERM. I am also thinking... whatif a user is stuck with a bad parent process (one that does notpropagate TERM) but a good grand-child process? The grand-child of a"crappy coded" parent process will only get kill -9, which might not bewhat the user wants. Maybe the user would like SIGTERM to get to thegrand-child. It's certainly not easy to find a way that works for allscenarios. There could be scenarios that I cannot imagine. This does notmean they do not exist.

Perhaps a way to offer a generalized solution is to have an option to"broadcast" the termination sequence to the entire process tree once theinitial attempt has failed, it could be called --broadcast-term-seq.

But I have a question: do you want to keep the current behaviour intactfor the existing users, or do you want to change it and force the usersto come along with the change?

GNU Parallel by design tries to do The Right Thing and not assume the
world is perfect. In a perfect world the parent will kill its
children. But if the parent is crappy code (which might be proprietary
or impossible to fix for other reasons), then the parent might not do
so. And would it then not be The Right Thing to kill -9 the
(grand*)children after we kill -9 the parent?

I am okay with hard coding any termination sequence inside GNU parallel,as long as the user has an option to change it in the case of anunforeseen scenario, it's like giving more freedom to the user.

The only situation I can think of where this behaviour is wrong, is in
the weird situation where the parent wants the children to live on,
without explicitly daemonizing the child. And I cannot come up with a
real life scenario where that will ever happen. Can you?

Users have surprised me in the past, and will surprise me in the future.I try to write programs that stay open, and only close the opportunitieswhen necessary.


I would like to propose the following:

1) Add an option (--term-seq TERM_SEQ) to specify the terminationsequence in the form of a comma separated list of signals and delays2) Add an option (--propagate-term-seq) to cause the terminationsequence to be propagated to the immediate child processes after thefirst kill -SIGTERM (but do not send to the process tree)3) When a timeout occurs and --term-seq is used, the timeout uses thattermination sequence, otherwise it uses the built-in termination sequence4) Add an option (--broadcase-term-seq) to broadcast the terminationsequence to the entire process tree, this happens only after the usualtermination sequence has been used and if there are remaining processes.


The default termination sequence remains the same:
TERM,200ms,TERM,200ms,KILL

And unless you want to change the existing method to apply thetermination sequence:The existing termination sequence application method to broadcast thesignals to the process tree, and to wait for the immediate children,remains the same. If the user specifies any of the new options, the newmethod kicks in: apply the termination sequence to the immediatechildren first, broadcast after if there are remaining processes (thismeans wait for the entire process tree, not just the immediate children).


Mediate on this I will. It is a lot to absorb.

Cheers,
Martin

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Revision of GNU Parallel's processing of SIGTERM, Ole Tange, 2015/04/12
- Re: Revision of GNU Parallel's processing of SIGTERM, Martin d'Anjou, 2015/04/12
  - Re: Revision of GNU Parallel's processing of SIGTERM, Ole Tange, 2015/04/13
    - Re: Revision of GNU Parallel's processing of SIGTERM, Martin d'Anjou, 2015/04/13
- Re: Revision of GNU Parallel's processing of SIGTERM, Martin d'Anjou, 2015/04/13
  - Re: Revision of GNU Parallel's processing of SIGTERM, Ole Tange, 2015/04/13
    - Re: Revision of GNU Parallel's processing of SIGTERM, Martin d'Anjou <=
    - Re: Revision of GNU Parallel's processing of SIGTERM, Martin d'Anjou, 2015/04/16

Prev by Date: Re: Revision of GNU Parallel's processing of SIGTERM
Next by Date: parallel + blast + LSF
Previous by thread: Re: Revision of GNU Parallel's processing of SIGTERM
Next by thread: Re: Revision of GNU Parallel's processing of SIGTERM
Index(es):
- Date
- Thread