Re: wait-process API limitation

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wait-process API limitation

From:	Bruno Haible
Subject:	Re: wait-process API limitation
Date:	Sun, 27 Sep 2009 10:05:08 +0200
User-agent:	KMail/1.9.9

Eric Blake wrote 2009-07-20:
> Right now, wait_subprocess assumes that an exit status of 127 from the
> child process implies failure, and prints a message unless null_stderr is
> set.  However, this led to a regression in m4 1.4.13 - an intentional
> status of 127 is ambiguous with failure only if the child had no output,
> and although m4 knows there was output, wait-process does not:
> 
> $ echo 'esyscmd(echo `dnl'\''; exit 127)sysval' | m4-1.4.11
> 127
> $ echo 'esyscmd(echo `dnl'\''; exit 127)sysval' | m4-1.4.13
> m4-1.4.13: esyscmd subprocess failed
> 127
> 
> The fix from m4's point of view is to pass null_stderr=false to
> create_pipe_in, but null_stderr=true to wait_subprocess, but this feels
> like a bit of a hack, because of the inconsistency in the named parameter.
>  Maybe it would be worth an API change to wait_subprocess to add an
> additional bool parameter, status_127_ok, which silences this particular
> error message if the calling process knows for certain that the child
> process produced output

I'm unsure what to do about this.

In the POSIX fork()/exec() calls, code 127 has no special meaning. But
when posix_spawn() is used instead of fork()/exec(), code 127 is a particular
way of signalling "unable to exec() after fork()". Also, inside a POSIX sh
shell, 127 is the standardized exit code in command, time, env, nice, nohup,
xargs for "the specified command could not be found". See the POSIX rationale:

  "The command, env, nohup, time, and xargs utilities have been specified
   to use exit code 127 if an error occurs so that applications can distinguish
   "failure to find a utility" from "invoked utility exited with an error
   indication". The value 127 was chosen because it is not commonly used for
   other meanings; most utilities use small values for "normal error conditions"
   and the values above 128 can be confused with termination due to receipt
   of a signal. The value 126 was chosen in a similar manner to indicate that
   the utility could be found, but not invoked. Some scripts produce meaningful
   error messages differentiating the 126 and 127 cases. The distinction
   between exit codes 126 and 127 is based on KornShell practice that uses 127
   when all attempts to exec the utility fail with [ENOENT], and uses 126 when
   any attempt to exec the utility fails for any other reason."

Should wait_subprocess take an argument that tells it to ignore code 127? That
would make it system-dependent: on systems where posix_spawn is used and behaves
in a certain way, error 127 will be generated, on others wait_subprocess will
signal the error itself.

Should wait_subprocess take an argument that tells it whether the subprocess
was created with posix_spawn? I don't want to tie the process creation and
termination handling too much together. Keep it flexible.

Should wait_subprocess should be given means to detect whether the subprocess
has already produced output? That creates unreliability: if a subprocess has
not yet produced output but would soon do, it will be handled differently than
a subprocess that has already produced output.

I'm more inclined to say: Code 127 means a failure to launch a subprocess,
period. If you write shell code such as
  echo `dnl'\''; exit 127
you are violating the semantics of code 127. Yes it is true that it worked
fine as long as no posix_spawn call was involved, but it is a wrong thing
to do in the bigger picture.

Bruno

[Prev in Thread]

Current Thread

[Next in Thread]

Re: wait-process API limitation, Bruno Haible <=

Prev by Date: test-yesno.sh failure
Next by Date: Re: getopt broken
Previous by thread: test-yesno.sh failure
Next by thread: Re: gnulib-tool --avoid problems
Index(es):
- Date
- Thread