[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

findutils-4.1.20: a comment on xargs.c arg_max

From: Nelson H. F. Beebe
Subject: findutils-4.1.20: a comment on xargs.c arg_max
Date: Thu, 9 Dec 2004 08:00:44 -0700 (MST)

The code for findutils-4.1.20/xargs/xargs.c contains this fragment:

  /* Sanity check for systems with huge ARG_MAX defines (e.g., Suns which
     have it at 1 meg).  Things will work fine with a large ARG_MAX but it
     will probably hurt the system more than it needs to; an array of this
     size is allocated.  */
  if (arg_max > 20 * 1024)
    arg_max = 20 * 1024;

In earlier releases of GNU findutils, I had made a local modification
to comment out that statement.

We no longer live in a PDP-11 world, and modern systems often have
many gigabytes of main memory.  It seems utterly draconian to limit
arg_max to 20KB, and I'm very sceptical that larger values will "hurt
the system".  The only computer systems where memory space is likely
to be sharply limited today are embedded systems, programmed by a
small number of people under careful controlled resource limits.

Computers are supposed to work for people, not the other way around.
When the environment size is restricted, users suffer from nonsense
like this (from an SGI IRIX 6.5 system):

        % find /usr/include/ -type f | xargs grep frobnitz
        xargs: environment is too large for exec

        % which xargs

        % /usr/local/bin/xargs --version
        /usr/local/bin/xargs: environment is too large for exec

        # How big is the environment?
        % env | wc -c

        # What is the POSIX minimum?
        % getconf _POSIX_ARG_MAX

Most GNU packages follow an important design principle of "no
arbitrary limits" on the size of objects.  findutils should too.
Please consider removing the 20KB limit, and making the code more
robust against large environment areas, as shown below.

As an experiment, I've just rebuilt findutils-4.1.20 on that system,
and changed the xargs code like this:

        % diff xargs.c.~1~ xargs.c
        >   (void)fprintf(stderr,"DEBUG: xargs: ARG_MAX           = %18ld\n", 
        >   (void)fprintf(stderr,"DEBUG: xargs: LONG_MAX          = %18ld\n", 
        >   (void)fprintf(stderr,"DEBUG: xargs: orig_arg_max      = %18ld\n", 
        >   (void)fprintf(stderr,"DEBUG: xargs: env_size(environ) = %18ld\n", 
        >   (void)fprintf(stderr,"DEBUG: xargs: capped arg_max    = %18ld\n", 
        >   (void)fprintf(stderr,"DEBUG: xargs: reduced arg_max   = %18ld\n", 
        >   if (arg_max < 1024 * 1024)
        >       arg_max = 1024 * 1024;
        >   (void)fprintf(stderr,"DEBUG: xargs: expanded arg_max  = %18ld\n", 

Here is what happens when I run it:

        % find /usr/include -type f | ./xargs cat | wc -l
        DEBUG: xargs: ARG_MAX           =               5120
        DEBUG: xargs: LONG_MAX          =         2147483647
        DEBUG: xargs: orig_arg_max      =               3072
        DEBUG: xargs: env_size(environ) =               5120
        DEBUG: xargs: capped arg_max    =               3072
        DEBUG: xargs: reduced arg_max   =              -2724
        DEBUG: xargs: expanded arg_max  =            1048576

The origin of the "environment is too large for exec" diagnostic and
immediate exit is now clear: the reduced arg_max is negative.

Guaranteeing a minimum of 1MB worked around the problem, and xargs ran
correctly, compared to what SGI's version does:

        % /bin/find /usr/include -type f | /bin/xargs cat | wc -l

I have a large collection of architectures to test code on, including
all of the major Unix flavors on all of the major CPU types, and will
be happy to assist in any testing that such changes might entail.

Thanks to the simh and Hercules simulator projects, I also now have
several historical Unix releases on simulated historical architectures
(PDP-11, Interdata-32, VAX, and soon, IBM S/360).  On the VAX at
least, I have gcc-2.95, so it should be possible to build most modern
packages on it.

For reference, here are some snippets from POSIX (IEEE Std
1003.1-2001) volumes 1--4 about ARG_MAX:

8694           {ARG_MAX}
8695              Maximum length of argument to the exec functions including 
environment data.
8696              Minimum Acceptable Value: {_POSIX_ARG_MAX}

8918               {_POSIX_ARG_MAX}
8919                    Maximum length of argument to the exec functions 
including environment data.
8920                    Value: 4096

9565               The number of bytes available for the new process' combined 
argument and environment lists is
9566               {ARG_MAX}. It is implementation-defined whether null 
terminators, pointers, and/or any
9567               alignment bytes are included in this total.

9862            [E2BIG]                The limit {ARG_MAX} applies not just to 
the size of the argument list, but to
9863                                   the sum of that and the size of the 
environment list.

28305           The number of bytes available for the child process' combined 
argument and environment lists
28306           is {ARG_MAX}. The implementation shall specify in the system 
documentation (see the Base
28307           Definitions volume of IEEE Std 1003.1-2001, Chapter 2, 
Conformance) whether any list
28308           overhead, such as length words, null terminators, pointers, or 
alignment bytes, is included in
28309           this total.

39989               The standard developers considered requiring that setenv( ) 
indicate an error when a call to it
39990               would result in exceeding {ARG_MAX}. The requirement was 
rejected since the condition might
39991               be temporary, with the application eventually reducing the 
environment size. The ultimate
39992               success or failure depends on the size at the time of a 
call to exec, which returns an indication of
39993               this error condition.

40395             The generated command line length shall be the sum of the 
size in bytes of the utility name and
40396             each argument treated as strings, including a null byte 
terminator for each of these strings. The
40397             xargs utility shall limit the command line length such that 
when the command line is invoked,
40398             the combined argument and environment lists (see the exec 
family of functions in the System
40399             Interfaces volume of IEEE Std 1003.1-2001) shall not exceed 
{ARG_MAX}-2048 bytes. Within
40400             this constraint, if neither the -n nor the -s option is 
specified, the default command line length
40401             shall be at least {LINE_MAX}.

40526                 On implementations with a large value for {ARG_MAX}, 
xargs may produce command lines
40527                 longer than {LINE_MAX}. For invocation of utilities, this 
is not a problem. If xargs is being used
40528                 to create a text file, users should explicitly set the 
maximum command line length with the -s
40529                 option.

40579             The requirement that xargs never produces command lines such 
that invocation of utility is
40580             within 2048 bytes of hitting the POSIX exec {ARG_MAX} 
limitations is intended to guarantee
40581             that the invoked utility has room to modify its environment 
variables and command line
40582             arguments and still be able to invoke another utility. Note 
that the minimum {ARG_MAX}
40583             allowed by the System Interfaces volume of IEEE Std 
1003.1-2001 is 4096 bytes and the
40584             minimum value allowed by this volume of IEEE Std 1003.1-2001 
is 2048 bytes; therefore, the
40585             2048 bytes difference seems reasonable. Note, however, that 
xargs may never be able to invoke a
40586             utility if the environment passed in to xargs comes close to 
using {ARG_MAX} bytes.

829              There are no explicit limits in IEEE Std 1003.1-2001 on the 
sizes of names, words (see the
830              definition of word in the Base Definitions volume of IEEE Std 
1003.1-2001), lines, or other
831              objects. However, other implicit limits do apply: shell script 
lines produced by many of the
832              standard utilities cannot exceed {LINE_MAX} and the sum of 
exported variables comes under
833              the {ARG_MAX} limit. Historical shells dynamically allocate 
memory for names and words and
834              parse incoming lines a character at a time. Lines cannot have 
an arbitrary {LINE_MAX} limit
835              because of historical practice, such as makefiles, where make 
removes the <newline>s associated
836              with the commands for a target and presents the shell with one 
very long line. The text on
837              INPUT FILES in the Shell and Utilities volume of IEEE Std 
1003.1-2001, Section 1.11, Utility
838              Description Defaults does allow a shell to run out of memory, 
but it cannot have arbitrary
839              programming limits.

9170           {ARG_MAX}
9171               This is defined by the System Interfaces volume of IEEE Std 
1003.1-2001. Unfortunately, it is
9172               very difficult for a conforming application to deal with 
this value, as it does not know how
9173               much of its argument space is being consumed by the 
environment variables of the user.

9228           There are different limits associated with command lines and 
input to utilities, depending on the
9229           method of invocation. In the case of a C program exec-ing a 
utility, {ARG_MAX} is the
9230           underlying limit. In the case of the shell reading a script and 
exec-ing a utility, {LINE_MAX}
9231           limits the length of lines the shell is required to process, and 
{ARG_MAX} will still be a limit. If a
9232           user is entering a command on a terminal to the shell, 
requesting that it invoke the utility,
9233           {MAX_INPUT} may restrict the length of the line that can be 
given to the shell to a value below
9234           {LINE_MAX}.

11574               {ARG_MAX}
11575                    The current minimum is likely to need to be increased 
for profiles, particularly as larger
11576                    amounts of information are passed through the 
environment. Many implementations are
11577                    believed to support larger values.

- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: address@hidden  -
- 155 S 1400 E RM 233                       address@hidden  address@hidden -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe  -

reply via email to

[Prev in Thread] Current Thread [Next in Thread]