bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AIX and Interix also do early PID recycling.


From: Chet Ramey
Subject: Re: AIX and Interix also do early PID recycling.
Date: Wed, 25 Jul 2012 10:50:44 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:13.0) Gecko/20120614 Thunderbird/13.0.1

On 7/25/12 3:59 AM, Michael Haubenwallner wrote:
> 
> On 07/25/2012 03:05 AM, Chet Ramey wrote:
>> Bash assumes that there's a PID space at least as
>> large as CHILD_MAX, and that the kernel will use all of it before reusing
>> any PID in the space.  Posix says that shells must remember up to CHILD_MAX
>> statuses of terminated asynchronous children (the description of `wait'),
>> so implicitly the kernel is not allowed to reuse process IDs until it has
>> exhausted CHILD_MAX PIDs.
> 
> What about grand-childs?
> They do count for the kernel, but not for the toplevel shell...

Yes, that's another problem.

>> The description of fork() doesn't mention this,
>> however.  The Posix fork() requirement that the PID returned can't
>> correspond to an existing process or process group is not sufficient to
>> satisfy the requirement on `wait'.
> 
> OTOH, AFAICT, as long as a PID isn't waitpid()ed for, it isn't reused by 
> fork().
> However, I'm unable to find that in the POSIX spec.

If the process hasn't been reaped by its parent, it's still technically
active, and its PID is not supposed to be eligible for reuse.

> 
>> Bash holds on to the status of all terminated processes, not just
>> background ones, and only checks for the presence of a newly-forked PID
>> in that list if the list size exceeds CHILD_MAX.  One of the results of
>> defining RECYCLES_PIDS is that the check is performed on every created
>> process.
> 
> What if the shell does not do waitpid(-1), but waitpid(known-child-PID).
> That would mean to waitpid(synchronous-child-PID) immediately, and
> waitpid(asynchronous-child-PID) upon some "wait $!" shell command, rendering
> to waitpid(-1) when there's no PID passed to "wait".

That's not how the shell is architected.  In your scenario, the shell
would ignore process status changes until and unless the script asked
for them.  You'd never reap processes that were begun to run command
substitutions, for example, and async processes would linger until the
script happened to ask for some status (which few do).

> 
>> I'd be interested in knowing the value of CHILD_MAX (or even `ulimit -c')
>> on the system where you're seeing this problem.
> 
> The AIX 6.1 I've debugged on has:
>   #define CHILD_MAX 128
>   #define _POSIX_CHILD_MAX 25
>   sysconf(_SC_CHILD_MAX) = 1024
> 
>   $ ulimit -H -c -u
>   core file size          (blocks, -c) unlimited
>   max user processes              (-u) unlimited
> 
>   $ ulimit -S -c -u
>   core file size          (blocks, -c) 1048575
>   max user processes              (-u) unlimited

Bash prefers sysconf(_SC_CHILD_MAX) and will use it over the other
defines (lib/sh/oslib.c:getmaxchild()).  I don't know why AIX chooses
to return a different value via sysconf than it defines for CHILD_MAX,
especially when it seems to use the CHILD_MAX value to decide when it
can recycle the PID space.


>> The case where last_made_pid is equal to last_pid is a problem only when
>> the PID space is extremely small -- on the order of, say, 4 -- as long as
>> the kernel behaves as described above.
> 
> I'm going to run this build job with 'truss -t kfork' again, to eventually 
> find
> some too small count of different PIDs before PID-recycling by the kernel...
> 
> Anyway - defining RECYCLES_PIDS for that AIX 6.1 has reduced the error rate 
> for
> this one build job from ~37 to 0 when run 50 times.

And I suspect that the single change of significance is to not check
against the childmax value when deciding whether or not to look for and
remove this pid from the list of saved termination status values.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]