bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'wait -n' with and without id arguments


From: Chet Ramey
Subject: Re: 'wait -n' with and without id arguments
Date: Thu, 5 Sep 2024 16:18:30 -0400
User-agent: Mozilla Thunderbird

On 8/30/24 11:06 PM, Zachary Santer wrote:

New wait-n-failure attached.

I'll look at this. These always take a long time.

(Apparently ${SECONDS} can't be declared local and still work.)

Making a variable local removes the special behavior on assignment and
reference.

$ source ~/random/wait-n-failure run monitor
run true
explicit_pids false
monitor true
notify false
posix false
bash 5.3.0(1)-alpha
[5]+  Done                       wait-n-failure_random_sleep
[1]   Done                       wait-n-failure_random_sleep
[2]   Done                       wait-n-failure_random_sleep
[3]   Done                       wait-n-failure_random_sleep
[4]-  Done                       wait-n-failure_random_sleep
[5]-  Done                       wait-n-failure_random_sleep
[6]-  Done                       wait-n-failure_random_sleep
[7]-  Done                       wait-n-failure_random_sleep
[8]   Done                       wait-n-failure_random_sleep
[9]   Done                       wait-n-failure_random_sleep
[10]+  Done                       wait-n-failure_random_sleep
[1]+  Done                       wait-n-failure_random_sleep
[1]+  Done                       wait-n-failure_random_sleep
[1]+  Done                       wait-n-failure_random_sleep
[... All following "Done" notifications are for jobs with job id 1.]
96 processes waited / 100 processes forked
11 seconds

I did not expect to see job notifications here. The changelog seems pretty clear that there shouldn't be any.

There are other code paths where the shell notifies, and this is probably
one of them. I will take a look at where. Certainly the conditions under
which the shell notifies about changed job status should be consistent.


On Mon, Aug 26, 2024 at 10:57 AM Chet Ramey <chet.ramey@case.edu <mailto:chet.ramey@case.edu>> wrote:
 >
 > On 8/14/24 11:22 PM, Zachary Santer wrote:
> > On Wed, Aug 14, 2024 at 3:22 PM Chet Ramey <chet.ramey@case.edu <mailto:chet.ramey@case.edu>> wrote:
 > >>
 > >> On 8/7/24 2:47 PM, Zachary Santer wrote:
 > >
 > >>> If you want the behavior of 'wait -n' to be
 > >>> consistent between scripts and the interactive shell, then it should
 > >>> choose one terminated child process from the list of those that is
 > >>> maintained in the interactive shell, if it's nonempty, to report to
 > >>> the user and to clear from that list, any time it is called.
 > >>
 > >> I'm not sure returning the status of some random process from some
 > >> arbitrary point in the past is going to be valuable.
 > >
 > > I think the value is in the consistent behavior of 'wait -n', which
 > > this would provide. If the user is intent on running 'wait -n' without
 > > id arguments in the interactive shell, they can ensure that child
 > > processes forked long ago are ignored by simply calling 'wait' without
 > > -n before moving on to what they're trying to do.
 >
 > Sure, they can do that. That's a new requirement, though.

I've seen you point out "I can't imagine why a person would do X, so it must never happen" as being fallaciou.

I'm sure I've been guilty of it myself.

However, I think the benefit to consistent behavior far outweighs the hardship caused to whoever would write a script intended for use within the interactive shell that depends on 'wait -n' without id arguments ignoring background processes that the user has already been notified of via the 'jobs' output.

Consider programmable completion frameworks, commands executed via
`bind -x', or traps (e.g., DEBUG) intended to provide enhancements to
the standard behavior, all of which exist and have generated reports or
requests for features.

People put pretty complicated stuff into PROMPT_COMMAND and other prompts,
too.

We don't know how the existing uses would be affected by changes until I
make them.

If the behavior here isn't modified, the man page really should note that 'wait -n' without id arguments won't return the termination status of a child process that has already been notified through the 'jobs' output.

That is exactly the behavior posix seems to require (`wait -n' aside, but
see below): once you notify the user, you delete the job and it disappears
forever. Bash doesn't delete jobs like that in all cases, and that's the
source of at least kre's objections (as you note below).

The only real question is when (and under what circumstances) you do the
notification, and that's what we're trying to hash out.


> > On Wed, Aug 14, 2024 at 4:44 PM Robert Elz <kre@munnari.oz.au <mailto:kre@munnari.oz.au>> wrote:
 > >>
 > >>    | Maybe the thing to do is to retain jobs in the job list, even after
 > >>    | they're marked as notified,
 > >>
 > >> I'd do the opposite, once they're notified, they should be deleted
 > >> from the jobs table, and everywhere else.   But "notified" only happens
> >> when the script explicitly asks (in a non-interactive shell, never because
 > >> of any other event than an appropriate command issued by the script, and
> >> in an interactive shell, the same, or the implicit "jobs" before each PS1).
 > >
 > > The implicit 'jobs' isn't happening before each PS1,
 >
 > This isn't what POSIX says to do, anyway.

https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_11

 >
 >   but after each
 > > command completes. Thus, all the
 > >> [1]   Done                    random_sleep
 > > notifications when sourcing wait-n-failure, before it prints
 > >> 3 processes waited / 8 processes forked
 > >> 1 seconds
 > > and exits.
 >
 > Kind of. The `interactive shell' isn't interactive while it's not reading
 > input from the terminal, so the shell prints notifications when a job
 > terminates. This is what happens when you source a file.

So my initial understanding of what 'set -o posix' was supposed to do now was wrong?

What was your initial understanding?


 > > So, actually only doing the implicit 'jobs' work and moving things
 > > from the jobs table to the list of saved pids and statuses before each
 > > PS1 *would* be a solution here.
 >
 > Before the next prompt, you probably mean.
 >
 > > When sourcing wait-n-failure, it's
 > > going to do all its work before any PS1 prompt.
 >
 > The behavior of performing notifications and removing jobs from the table
 > is long-standing: it's been this way since 1999, and is a mechanism to
 > prevent long-running sourced scripts from filling up the jobs list (which
 > was a lot smaller in '99). So you need to accommodate those backwards
 > compatibility issues somehow.

'wait -n' without id arguments reporting the termination status of a child process that has already been reported to the user through the 'jobs' output and clearing that information from the list of saved ids and statuses would then be less of a disruption.

To whom?


 > >>> So basically, 'wait -n' should be implemented such that sourcing the
 > >>> script with a false argument gives the same behavior as you've seen
 > >>> when sourcing it with a true argument: the infinite loop.
 > >>
 > >> How long should notification be deferred? Until the script completes?
 > >
 > > That's more or less the solution I presented above. 'wait -n' without
 > > id arguments returning the termination status of a child process that
 > > the user has already been informed of through the implicit 'jobs'
 > > output would also work, and might be less of a weird behavior change
 > > for users to get over.
 >
 > OK. How would you reconcile the backwards compatibility issue?

There's always ${BASH_COMPAT},

Yes, that's partly what I chose to go with.

but considering the surprising and arguably undesirable nature of 'wait -n' without id arguments not returning the termination status of a child process that has already been reported to the user through the 'jobs' output,

Again, POSIX appears to require this. Bash doesn't do it consistently,
though that status is always available to `wait pid' uses.

I would really question why someone would write code dependent on that behavior in the first place. And again, this issue has never come up in a script intended to be called normally (without it calling 'jobs').

You might be surprised at the complexity of code people run with
programmable completion or `bind -x' -- often code they didn't write
themselves.


This whole issue is such a corner case, though it seems like an easily- solved problem.

 > There are only three approaches.

And those are?

Don't change (or change unconditionally); change based on a shell option;
change based on the shell compatibility level.

 > > This breaks down with 'set -b'/'set -o notify'. Short of 'wait -n'
 > > printing a warning message or erroring out when it is invoked while
 > > 'set -b' is active, this isn't a complete solution.
 >
 > If you enable the notify option, which is not the default, you should be
 > responsible for managing the consequences. notify is always going to result
 > in different behavior; see
 >
> https://pubs.opengroup.org/onlinepubs/9799919799/utilities/ V3_chap02.html#tag_19_11 <https://pubs.opengroup.org/onlinepubs/9799919799/ utilities/V3_chap02.html#tag_19_11>

It's not clear from the bash manual that there's a relationship between printed 'jobs' notifications and what 'wait -n' without id arguments will report. Under the (fair) assumption that there is none, one would think that 'set -b' would also have no effect.

So we're back to notification again.

Maybe it's one of those things that everyone knows how it should behave (or
thinks they do), but it's not actually written down anywhere. POSIX is
definite about `wait' removing a job from the list, and definite that
`jobs' when run as a command removes reported jobs from the list, but is
ambiguous about whether or not the notification messages it specifies in
2.11 have the same effect. No shell documents it explicitly, but everyone
does it.


 > > I really think the solution here is for 'wait -n' to return the
 > > termination status of a child process that has already terminated and
 > > that the user has already been informed of. Ultimately, whatever set
 > > of commands is being invoked together and the user who is being
 > > informed of terminated child processes are two different things.
 > > Informing the user does nothing for the set of commands.
 >
 > No, that counts as notification. After the user is notified, the shell
 > is free to remove the job from the list. Bash happens to keep the status
 > around for a while;

Bash does that because that behavior is more useful. The user might want to call 'wait' with an id argument and find that process's termination status programmatically, despite the 'jobs' output having already informed them. In the same vein, it's more useful for 'wait -n' to be able to guarantee a one-to-one relationship of forked child process to 'wait -n'-returned termination status.

 > kre, for instance, advocates removing it entirely.

That would preclude what he was asking for earlier, wouldn't it?

Which part? He says the shell shouldn't notify unless it's about to
display $PS1, and since the jobs will still be in the jobs list and
not marked as notified, `wait -n' will be free to return and subsequently
delete them.


On Fri, Jul 12, 2024 at 8:41 PM Robert Elz <kre@munnari.oz.au <mailto:kre@munnari.oz.au>> wrote:
 >
 > [U]se the first definition of "next job to
 > finish" - and in the case when there are already several of them,
 > pick one, any one - you could order them by the time that bash reaped
 > the jobs internally, but there's no real reason to do so, as that
 > isn't necessarily the order the actual processes terminated, just
 > the order the kernel picked to answer the wait() sys call, when
 > there are several child zombies ready to be reaped.

Removing the status entirely after 'jobs'-output notification would prevent the above from working, right? Or maybe he was then under the same impression that I was: that 'wait -n' would fail to report the termination status of child processes that had terminated prior to the call to 'wait - n' in all circumstances. When it's the result of a race between the 'jobs' output and the call to 'wait -n', it's okay?

You mean job status notifications again, I'm sure. I think kre's position
can be summed up as: the jobs output for notification wins since it removes
jobs from the jobs list, set -b messes things up accordingly; you never
notify unless you're about to print $PS1; and the jobs and wait builtins
remove jobs from the list.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]