[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Percent Signs in External Commands on Windows

From: David Millis
Subject: Re: [bug-gawk] Percent Signs in External Commands on Windows
Date: Tue, 10 Apr 2012 23:46:07 -0700 (PDT)

Oh, disclaimer so you don't get exasperated thinking I'm beating last year's 
horse: I'm no longer advocating a change in gawk itself. Percents aren't that 
This is housekeeping.

--- On Tue, 4/10/12, Eli Zaretskii <address@hidden> wrote:

>> And I have difficulty imagining a scenario where an unintended
>> percent would sneak into a filename or compiler/archiver args,
>> so it's not surprising make's doing okay.
> Then why wouldn't Gawk do fine as well?
For make, $(FOO)'s values are explicit: hardcoded, or args to make itself, or 
env vars (which shouldn't contain percents in the circs you'd use make).

Gawk can build command strings at runtime from arbitrary files/pipes... though 
all the examples I can think of where funky strings are filtered into/through 
external apps could instead have the strings fed into STDIN from a pipe or 
redirected from a temporary text file.

Hrm, the missing two-way pipe (|&) feature was what _originally_ motivated me 
to use args:
"\"path\\wget.exe\" -O - -q \"...\"" | getline
I can pipe in, or out, but not both without a coprocess or temp file. Note 
that, to be runnable, this example string needs either concatenated fodder 
quotes (see below, CMD's help: #2) or doubled percents in the url.

> First, CMD doesn't do with quotes what you think it does,
> please read the Microsoft documentation about that.  And you
> _do_ need to escape inner quotes

>From cmd /? help...
If /C or /K is specified, then the remainder of the command line after
the switch is processed as a command line, where the following logic is
used to process quote (") characters:

    1.  If all of the following conditions are met, then quote characters
        on the command line are preserved:

        - no /S switch
        - exactly two quote characters
        - no special characters between the two quote characters,
          where special is one of: &<>()@^|
        - there are one or more whitespace characters between the
          the two quote characters
        - the string between the two quote characters is the name
          of an executable file.

    2.  Otherwise, old behavior is to see if the first character is
        a quote character and if so, strip the leading character and
        remove the last quote character on the command line, preserving
        any text after the last quote character.
MSDN repeats this.

Is this not basically what would happen?
1) Gawk code: system("WHATEVER"); # or "\"WHATEVER\""
2a) C code: popen/system/etc("WHATEVER");
2b) A shell, CMD, is tracked down and the func becomes...
2c) func-that-execs("path\\to\\cmd.exe", "/C", "WHATEVER");
3a) CMD then decides whether to eat a pair of quotes (no unescaping happens),
3b) and replaces percented words that match known variables, leaving others be.
3c) Then, as it would in batch, it tokenizes the commandline based on 
un-careted quotes/pipes/redirects/etc into programs and args (sans such carets),
3c) and spawns programs with their args otherwise unaltered.
4) Each program (or shell built-in) individually globs its own array of args, 
if desired.

It sounds like you're saying APIs are messing with the contents of WHATEVER at 

> The question is, should we advise users to go this way as a
> workaround for possible tricky issues with percent signs.
Concatenating ENVIRON["FOO"] among strings where %FOO% was previously, while 
longer, is _definitely_ the sanest approach regardless of platform and what's 
under the hood.

With ENVIRON[], percent doubling should indeed be adequate when temp batching. 
Anyone doing something elaborate like reading verbatim command strings from a 
text file should be capable of manually substituting %FOO% with gensub or, to 
pedantically avoid accidental recursive var expansion, a loop.

But since a distributed script can't assume the behavior of its interpreter 
(unless bundled by the author), to be agnostic, it'd have to run a ~60-line 
quote-vs-percent check anyway to determine which 1-line fix to apply. Too late 
for PROCINFO[] hints regarding port quirks...

As blanket advice, I'd just suggest experimenting with both fixes during 
development and warning that builds vary: parsing WHATEVER either as an 
argument to the shell, or as a line in a temporary shell script. The relevant 
situations on windows being [when a quoted path\program is given a quoted arg] 
or [when percents appear]. For ambitious users, I don't know where an optimized 
example of that check would be best offered up.

*chuckle* Abusing the temp bat as shorthand for building batches:
system(":top\nECHO hello\nPAUSE\nGOTO top");

Side note: In my attachment earlier, initGlobals() doesn't need
the two "CR-LF vs LF" gensubs. Troubleshooting litter, before I realized the 
invisible char I needed to account for in the string comparisons was a space.

David Millis

reply via email to

[Prev in Thread] Current Thread [Next in Thread]