guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

What 'sh' should 'system' use?


From: Philip McGrath
Subject: What 'sh' should 'system' use?
Date: Sun, 18 Sep 2022 20:13:44 -0400

Hi Guix,

The C standard library includes a function 'system' to run a string as a shell 
command. Other languages provide similar functions, including Guile and many 
other Schemes and the Standard ML Basis Library.[1][2] Even without a 
dedicated library function, a program might want to run 'sh' using a general-
purpose mechanism for launching subprocesses.

How should the implementation of a function like 'system' find an 'sh' 
executable?

This question came up most recently with the patch series updating to Racket 
8.6 [3]: we already had patched in a workaround for Racket, but Chez Scheme 
and Zuo also needed workarounds to find 'sh'. I'm going to try to summarize the 
context I found in the course of that discussion and explain my current 
thinking, which changed over the course of that thread.

I think Guix should decide on an approach for functions like 'system' that can 
be applied consistently across languages. In particular, I don't think what 
our 'glibc' package is currently doing makes sense under any sort of approach.

First, an overview of three ways of trying to answer this question:

1) Many programs assume 'sh' can be found at '/bin/sh', but this is not true 
in Guix build environments, and it is not portable in general. Some systems 
have historically had a non-POSIX shell at '/bin/sh' and a POSIX shell at '/
usr/xpg4/bin/sh'.

More significantly, on Android, the shell is never at '/bin/sh'! (It is 
typically at '/system/bin/sh', except it is '/vendor/bin/sh' for "vendor 
code", whatever that is.[4]) That may be relevant for upstreams' willingness 
to reconsider this assumption.

In recent Python, the implementation of 'subprocess.Popen' uses [5]:

                unix_shell = ('/system/bin/sh' if
                          hasattr(sys, 'getandroidapilevel') else '/bin/sh')

which accounts for Android while being completely non-general.

On the other hand, even Nix puts '/bin/sh' at its usual path: we are really 
quite an outlier in this respect. (IIUC, Nix also has '/usr/bin/env', but no 
other utilities at FHS paths.)

In Glibc, 'sysdeps/posix/system.c' assumes '/bin/sh', and Guix currently 
patches it to refer to a store path as a string constant (but see below for 
issues).

2) There is a non-standard but ubiquitous macro '_PATH_BSHELL' in 'paths.h' 
which is supposed to give the path of a Bourne-like shell. In Guix, we patch 
this to refer to a store path as a string constant (but again, see below for 
issues). Notablty, on Android, it is not a compile-time constant: it is

    #define _PATH_BSHELL __bionic_get_shell_path()

where the function returns '/system/bin/sh' or '/vendor/bin/sh' as appropriate 
(but, in any case, it returns a `const char*` to a compile-time constant, so 
no manual memory management is needed).

3) POSIX actually has an answer to the question of how to find 'sh', but, 
unfortunately, its answer doesn't work in Guix build environments.

The POSIX spec for 'system' [6] says, in the informative section "Application 
Usage":

> There is no defined way for an application to find the specific path for the
> shell. However, confstr() can provide a value for PATH that is guaranteed
> to find the sh utility.

Likewise, the spec for 'sh' [7] says in the corresponding section:

> Applications should note that the standard PATH to the shell cannot be
> assumed to be either /bin/sh or /usr/bin/sh, and should be determined by
> interrogation of the PATH returned by getconf PATH, ensuring that the
> returned pathname is an absolute pathname and not a shell built-in.

Most emphatically, the spec for 'confstr' [8] says in the normative section 
"Description":

> If the implementation supports the POSIX shell option, the string stored in
> buf after a call to:
> 
>     confstr(_CS_PATH, buf, sizeof(buf))
> 
> can be used as a value of the PATH environment variable that accesses all of
> the standard utilities of POSIX.1-2017, that are provided in a manner
> accessible via the exec family of functions, if the return value is less
> than or equal to sizeof(buf).

It's worth noting here that 'PATH' is explicitly not consulted. Likewise, from 
the rationale section of [6]:

> One reviewer suggested that an implementation of system() might want to use
> an environment variable such as SHELL to determine which command
> interpreter to use. The supposed implementation would use the default
> command interpreter if the one specified by the environment variable was
> not available. This would allow a user, when using an application that
> prompts for command lines to be processed using system(), to specify a
> different command interpreter. Such an implementation is discouraged. If
> the alternate command interpreter did not follow the command line syntax
> specified in the Shell and Utilities volume of POSIX.1-2017, then changing
> SHELL would render system() non-conforming. This would affect applications
> that expected the specified behavior from system(), and since the Shell and
> Utilities volume of POSIX.1-2017 does not mention that SHELL affects
> system(), the application would not know that it needed to unset SHELL.

It seems that 'confstr' is supposed to access "configuration-defined string 
values", i.e. with possible configuration applied runtime, in contrast to the 
compile-time 'CS_PATH' (without an underscore).

Unfortunately, AFAICT, Glibc's 'confstr' implementation for '_CS_PATH' doesn't 
have any mechanism for configuring the search path: it simply returns the 
compile-time constant, which is:

    #define     CS_PATH "/bin:/usr/bin"

and neither of those directories exist in Guix build environments.

So, with that context in mind, what are the problems with Guix's Glibc, and 
with existing solutions more generally?

First, a test program I tried in [9] seemed to indicate that '_PATH_BSHELL' 
refered to 'bash-static', but 'system("echo $BASH")' referred to 'bash-
minimal'. It's possible that my test gave an incorrect answer: I just tried 
'guix size glibc' (I hadn't thought of that earlier), and it doesn't list a 
reference to 'bash-minimal'. But, if we are embedding references in libc to 
two different copies of Bash, that seems clearly bad.

More broadly, I now think it would be better in we embedded zero references to 
copies of Bash in libc.

I have changed my mind on this before, and I could be persuaded otherwise. 
When I wrote the Racket patch for '/bin/sh' that had been in place before the 
latest change, I initially was going to use a hard-coded Bash only when '/bin/
sh' did not exist, but the discussion persuaded me it would make more sense to 
always use the 'sh' from the package's inputs.[10] For Racket, a dependency on 
'sh' didn't seem too unreasonable.

However, giving every program using Glibc a hard dependency on Bash—and on a 
particular Bash executable—seems like a much bigger imposition.

I now think it would be better to find 'sh' dynamically at run time rather than 
embed a reference to a specific shell at compile time. When 'sh' is needed, it 
can be provided by a build system or as an explicit input. When 'sh' isn't 
needed and perhaps isn't wanted, we should be able to create environments and 
programs without it, without libc pulling it along.

I found this note from the Linux man-pages project [11] interesting in that 
regard:

>        In versions of glibc before 2.1.3, the check for the availability
>        of /bin/sh was not actually performed if command was NULL;
>        instead it was always assumed to be available, and system()
>        always returned 1 in this case.  Since glibc 2.1.3, this check is
>        performed because, even though POSIX.1-2001 requires a conforming
>        implementation to provide a shell, that shell may not be
>        available or executable if the calling program has previously
>        called chroot(2) (which is not specified by POSIX.1-2001).

Finally, some possible courses of action:

1) If we want to continue to hard-code a specific shell into Glibc, I think we 
should document the decision (for example, why 'bash-static' vs. 'bash-
minimal'?) and recommendations for how packages should use it: '_PATH_BSHELL' 
is the best mechanism I've heard of so far, though I wish it were 
standardized, and the fact that it can't be portably assumed to be a string 
constant could be surprising.

2) If we want to make 'sh' a weak/dynamic reference, I think we should 
strongly consider arranging to make it available at '/bin/sh' when present. I 
expect this option would require less patching of other packages *by far* than 
any other approach.

3) If we want a dynamic 'sh' not located at '/bin/sh', I think we should 
implement a function similar to '__bionic_get_shell_path()' and use it for 
'_PATH_BSHELL', 'system', etc. That begs the question of how the function 
should find 'sh', and I don't have an answer for that. In principle, we could 
design a configuration mechanism for 'confstr(_CS_PATH, buf, sizeof(buf))' and 
use it to find the shell: that has some appeal, but making the mechanism 
extensible enough to support "all of the standard utilities of POSIX.1-2017" 
seems like a challenge.

What do you think?

-Philip

[1]: https://smlfamily.github.io/Basis/os-process.html#SIG:OS_PROCESS.system:VAL
[2]: https://lists.gnu.org/archive/html/help-guix/2021-11/msg00036.html
[3]: https://issues.guix.gnu.org/57050
[4]: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/
__bionic_get_shell_path.cpp
[5]: https://github.com/python/cpython/blob/
8184f0fce3b734413e3d3a282f1425d3cb8507fd/Lib/subprocess.py#L1760-L1762
[6]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/system.html
[7]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html
[8]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html
[9]: https://issues.guix.gnu.org/57050#63
[10]: https://issues.guix.gnu.org/47180
[11]: https://man7.org/linux/man-pages/man3/system.3.html

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]