[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] env: support encoding of args into command.
From: |
Kaz Kylheku (Coreutils) |
Subject: |
Re: [PATCH] env: support encoding of args into command. |
Date: |
Mon, 29 May 2017 19:37:39 -0700 |
User-agent: |
Roundcube Webmail/0.9.2 |
On 29.05.2017 04:29, Eric Blake wrote:
On 05/27/2017 07:30 PM, Kaz Kylheku (Coreutils) wrote:
Bascially I'm completely against almost every aspect of this
-S design; and I suspect the POSIX standardization people
(Austion Group) won't adopt it, either, so it will forever
remain just a FreeBSD feature (and we can help keep it that
way by not copying it).
The Austin Group has already declared that #! is non-portable, and that
portable scripts can't use it, BECAUSE of the wide variety in how
kernels handle it and the small limits on how much you can cram in that
line.
Gentleman, please disregard the patch.
I don't care about it any more because I have discovered a
hack which makes it pointless.
With excellent language-level backward compatibility, a given
scripting language interpreter "interp" can provide support
for being invoked in the following manner:
#!/usr/bin/env interp\000trailing material
Here \000 represents a literal embedded null byte.
So, of course, env receives arg[1] as "interp"
and finds the interpreter properly. This is the case
whether the kernel stops reading the string after the null,
or wheter the kernel passes the character array
"interp\000trailing material" as argv[1] to env.
either way, env only sees "interp".
The interpreter can then open the script and read the full line,
look for the null byte, and give a meaning to "trailing material".
The interpreter can, in that space, implement the equivalent
of my argument delimiting approach, or the more elaborate one
taken in BSD's env -S.
The notation is very space efficient: just one delimiting character
which positively requires no escaping.
It doesn't require on adding a second line to the script for encoding
the material, which can change the meaning of existing scripts.
It also potentially defeats limitations on hash bang line size.
Why? Because the only requirement which has to be met is that the
null byte occurs within the header size limit! Not the entire hash
bang line.
The programmer is not relying on the hash bang mechanism to pass
anything after the null byte through the command line, so if any
of it is cut off, that is immaterial.
So far, I have tried this on Darwin, Linux, Solaris and Cygwin:
works fine!
A possible objection is that every interpreter has to implement its
own hack for recognizing the material after the null byte and
doing something. The solution for that, of course, is to provide
a library function for dealing with it: a function which takes
(argc, argv), and index of which argv[] is the script name,
and returns a transformed (argc, argv).
The thing to do is to develop develop that library function to make
it easy for interpreter writers to just "drop in".