bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'wait -n' with and without id arguments


From: Zachary Santer
Subject: Re: 'wait -n' with and without id arguments
Date: Sun, 8 Sep 2024 20:35:24 -0400

Slightly improved wait-n-failure attached.

On Thu, Sep 5, 2024 at 12:10 PM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 8/30/24 11:06 PM, Zachary Santer wrote:
> > CWRU/CWRU.chlog:
> >  >    8/26
> >  >    ----
> >
> >  > execute_cmd.c
> >  > [...]
> >  > - execute_connection: in default mode, bash performs jobs notifications
> >  >   in an interactive shell between commands separated by ';' or '\n'.
> >  >   It shouldn't do this in posix mode, since posix now specifies when
> >  >   notifications can take place
> >
> > I forgot your comment below about the shell not being interactive any time
> > it's not accepting input from the user and took this to mean that 'jobs'
> > notifications would only ever be printed immediately prior to a prompt when
> > bash is in posix mode. I don't understand what posix mode changes relative
> > to the existing behavior if not that.
>
> In default mode, bash prints job notifications when executing a list (it
> calls lists `connections' internally). Commands in lists can be delimited
> by a `;' or newline; bash performs notifications between executing, say,
> the left and right sides of `command 1; command 2'. Now it doesn't do that
> in posix mode. It still does job notifications in other places that aren't
> strictly posix conformant.
>
> >
> >  > jobs.c
> >  > - notify_and_cleanup: make interactive shells notifying during sourced
> >  >   scripts dependent on the shell compatibility level and inactive in
> >  >   versions beyond bash-5.2
> >  >   Inspired by report from Zachary Santer <zsanter@gmail.com>
> >
> > Making 'jobs' notifications not happen while the interactive shell is
> > sourcing a script misses the cases where a function is otherwise executed
> > directly from the command line and of course a whole bunch of commands
> > separated by semicolons entered in one command line.
>
> What behavior do you want from the command lists that differs from what I
> described above? Since shell functions are essentially lists, you should
> get the same behavior from both.

You'd have to restrict job status notification to only ever occur
immediately prior to a prompt, in both posix and default mode, and
then you'd still need a blurb in the BUGS section of the manual saying
that 'set -b' has a potentially surprising impact on 'wait -n' in the
interactive shell.

For this to work, you'd have to choose one of the following new behaviors:

1) Background jobs that are both forked and cleared from the jobs
table by a call to 'wait' in the time between an accept-line and the
following prompt would never receive a job id or be notified to the
user in any way. No
[1] 16053
when it's forked, and no
[1]+  Done                       wait-n-failure::random_sleep
when it terminates. Only jobs that were forked and have not been
cleared from the jobs table when it's time to display the next prompt
would receive a job id or print the
[1] 16053
line. (That line isn't very useful anyway when you don't know what was
forked, i.e. because it wasn't simply the contents of the prior line
with an ampersand after them.)

2) Job ids assigned to background jobs continue to increase
monotonically, between accept-line and prompt, even as some of those
jobs are removed from the jobs table by calls to 'wait'. In this
scenario, the
[1] 16053
line can be printed when each background job is forked or immediately
prior to the
[1]+  Done                       wait-n-failure::random_sleep
lines. If instead, freed job ids can be assigned to new jobs again
before the following prompt, as is the case now, multiple jobs with
the same job id would show up as "Done" at the same time.

If the 'jobs' builtin is called in the midst of a command list being
run with either behavior, this would cause the same updates to the
jobs table and list of saved pids and statuses as would occur
immediately prior to a prompt. The user would have to know that
calling the 'jobs' builtin would have an impact on what processes
'wait -n' without id arguments will return the termination status of.
That would have to be documented in the man page.

Of the above options, I think 1) is my preference. In either case,
though, the change in shell behavior evident to the user would extend
beyond the behavior of 'wait -n'. By contrast, making 'wait -n'
without id arguments return the termination status of a background job
that has already been notified to the user and clear that job from the
list of saved pids and statuses wouldn't necessitate any change to
notification behavior, albeit it might cause problems for code
intended for use within the interactive shell.


On Thu, Sep 5, 2024 at 4:18 PM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 8/30/24 11:06 PM, Zachary Santer wrote:

> > On Mon, Aug 26, 2024 at 10:57 AM Chet Ramey <chet.ramey@case.edu> wrote:

> > However, I think the benefit to
> > consistent behavior far outweighs the hardship caused to whoever would
> > write a script intended for use within the interactive shell that depends
> > on 'wait -n' without id arguments ignoring background processes that the
> > user has already been notified of via the 'jobs' output.
>
> Consider programmable completion frameworks, commands executed via
> `bind -x', or traps (e.g., DEBUG) intended to provide enhancements to
> the standard behavior, all of which exist and have generated reports or
> requests for features.
>
> People put pretty complicated stuff into PROMPT_COMMAND and other prompts,
> too.
>
> We don't know how the existing uses would be affected by changes until I
> make them.

Could changes to job status notifications cause issues for these users as well?

Generally speaking, I would expect most functions defined in any of
the above that call 'wait' or 'wait -n' from an interactive
environment to track and use explicit pid arguments, to avoid waiting
on other background jobs the user forked themselves. In that case, the
behavior they would see, using 'wait -n', has already changed for the
better. The use of 'wait -n' without pid arguments in an interactive
environment is more likely to be something that a user just typed on
the command line themselves.

There are a lot of helpful people on this email list testing the devel
branch against their existing scripts, but I know there are still
plenty of valid bug reports arising from bash release versions.

> > If the behavior here isn't modified, the man page really should note that
> > 'wait -n' without id arguments won't return the termination status of a
> > child process that has already been notified through the 'jobs' output.
>
> That is exactly the behavior posix seems to require (`wait -n' aside, but
> see below): once you notify the user, you delete the job and it disappears
> forever.

Should still be in the man page. Very few shell programmers are
reading the POSIX standard.

Does POSIX provide a rationale for this requirement? I imagine 'set -o
posix' exists partly to allow bash's default behavior to differ from
the POSIX specification, when POSIX specifies behavior that isn't
particularly helpful. I'd be curious to know if the Austin Group
people have considered the implications of a feature like 'wait -n'.

If you go the route of changing job notification behavior, would that
be the end of the list of saved pids and statuses? Maintaining that
list is more useful than simply following POSIX to a tee. What would
be the benefit to the user of making the termination status of
notified jobs unavailable to the 'wait' builtin?

'wait -n' with pid arguments now has access to this list, which is
good. It wouldn't be going much further to allow 'wait -n' without pid
arguments to act on the list as well. Meanwhile, POSIX is telling you
the list shouldn't even exist to begin with.

Given some very simplistic testing, I see that 'wait' in both mksh and
zsh will return the termination status of a background job that has
already been notified to the user as terminated in the interactive
shell, once. Subsequent calls to 'wait' with that same pid argument
return status 127. By contrast, bash in default mode will continue to
return the termination status of the same background job upon
subsequent calls to 'wait'.

Making 'wait -n' without id arguments work the way I'd like it to
would likely entail that 'wait' with id argument would work like it
does in mksh and zsh. It would be no good for 'wait -n' without id
argument to continue to return the termination status of the same
terminated background job over and over again. With access to the list
of pids and statuses, it would have to remove the element for the
terminated background job it reports for it to not simply report it
again upon the next invocation. It would be more consistent for 'wait'
with pid argument to then also remove the element from this list for
the id argument it was passed.

So yes, now I'm asking for more changes to behavior with potential
unforeseen consequences. I didn't know you could explicitly wait on
the same pid multiple times in a script called normally, and that only
a call to 'wait' without id arguments clears the list of saved ids and
statuses. So clearing the element of this list that was explicitly
waited on would be new behavior in any context.

Getting back to the POSIX point, if bash, mksh, and zsh all violate
POSIX in this way, maybe it's POSIX that should change.

> Bash doesn't delete jobs like that in all cases, and that's the
> source of at least kre's objections (as you note below).
>
> The only real question is when (and under what circumstances) you do the
> notification, and that's what we're trying to hash out.

> >  > Kind of. The `interactive shell' isn't interactive while it's not reading
> >  > input from the terminal, so the shell prints notifications when a job
> >  > terminates. This is what happens when you source a file.

Would making job status notifications only ever occur immediately
prior to a prompt (or when calling 'jobs') violate POSIX too? At the
end of the day, 'wait -n' without id arguments should be able to
guarantee that one-to-one relationship of forked child process to
'wait -n'-returned termination status, one way or the other.

> > So my initial understanding of what 'set -o posix' was supposed to do now
> > was wrong?
>
> What was your initial understanding?

No job status notification at any point except immediately prior to a
prompt (or when calling 'jobs').

> >  > The behavior of performing notifications and removing jobs from the table
> >  > is long-standing: it's been this way since 1999, and is a mechanism to
> >  > prevent long-running sourced scripts from filling up the jobs list (which
> >  > was a lot smaller in '99). So you need to accommodate those backwards
> >  > compatibility issues somehow.
> >
> > 'wait -n' without id arguments reporting the termination status of a child
> > process that has already been reported to the user through the 'jobs'
> > output and clearing that information from the list of saved ids and
> > statuses would then be less of a disruption.
>
> To whom?

The users. As I said above, changing the behavior of 'wait -n' without
id arguments wouldn't necessitate any change to job status
notifications.

Attachment: wait-n-failure
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]