[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Parallel Bug Reports why is parallel invoking a shell **by defau
Re: GNU Parallel Bug Reports why is parallel invoking a shell **by default** and associated bugs
Sun, 13 Nov 2016 19:15:38 +0100
On Wed, Nov 9, 2016 at 11:32 PM, Stephane Chazelas
> 2015-05-23 21:50:51 +0100, Stephane Chazelas:
>> You intend it to run ["cmd", "1"], ["cmd", "2"]... in parallel,
>> At the moment, depending on the shell (and it's not always clear
>> which one you'll get)
It should be much clearer now:
>> there are a few bugs.
>> For instance with zsh:
>> $ printf '=z\n' | PARALLEL_SHELL=zsh parallel 'printf "<%s>\n"'
>> zsh:1: z not found
>> In zsh, a leading = is a globbing operator that is not currently
>> escaped by parallel.
>> With csh/tcsh:
>> $ printf 'a\nb\0' | PARALLEL_SHELL=tcsh parallel -0 'printf "<%s>\n"'
>> Unmatched '.
>> Unmatched '.
These were fixed in 20150622.
> To add to the list:
> $ printf '\243`/tmp/xxx\243`\n' |
> LC_ALL=zh_HK.big5hkscs PARALLEL_SHELL=bash parallel echo
> bash: /tmp/xxxα: No such file or directory
> $ parallel --version
> GNU parallel 20161022
> $ LC_ALL=zh_HK.big5hkscs locale charmap
> (that α is only rendered as α if your terminal's charset is
> BIG5-HKSCS. In a UTF-8 terminal, you'd probably see something
> like �\ instead)
> In that Hong Kong character set, ε (U+03B5) is encoded as 0xa3
> 0x60. 0x60 also happens to be ` (backtick U+0060) in ASCII (and
> BIG5-HKSCS when by itself for that matters)..
> parallel thinks that 0x60 is a backtick that it needs to escape
> instead of being the second byte of that ε character. It also
> escapes the \243 byte, both with backslash.
> so it actually runs
> exec("bash", "-c", "echo
> 0x5c is backlash in ASCII and BIG5-HKSCS but 0xa3 0x5c is α
> (U+03b1) in BIG5-HKSCS, so bash which is multi-byte aware sees:
> echo \α`/tmp/xxxα`
> instead of the intended
> echo ε/tmp/xxxε
> And tries to run the xxxα command.
> Note that it's not the only charset with this kind of problem.
> BIG5, GB18030 and GBK as well (at least). It's not only those
> characters. It's any charaset that have multi-byte characters
> where some of the byte components also happen to be ASCII
> characters special to the shell.
I can confirm that.
> For those charsets above, using single quotes instead of
> backslash for quoting helps as 0x27 is not part of any
> multi-byte character in those charsets.
> echo 'ε/tmp/xxxε'
> would not be a problem.
> echo '<0xa3>''<0x60>'/tmp/xxx'<0xa3>''<0x60>'
> would not be a problem with bash but would be with yash that
> chokes on byte sequences that don't form valid characters.
> echo \ε/tmp/xxx\ε
> would be OK in bash, but not in shells that are not multi-byte
> aware like dash.
Having tried your examples in dash and zsh it seems they work with no
problem. Maybe due to the shells not understanding multi-byte chars.
> Quoting in shell is a tricky business. It's best not to
> invoke a shell in the first place if it can be at all avoided.
You cannot use composed commands, functions, and redirection without
wrapping in a shell. A lot of the helper functions also use a shell:
To me this by far outweighs the problem that you may have by having to
But it will be even better if we can find a way to adapt the quoting,
so it will work correctly in both dash/zsh and bash with both LC_ALL=C
It will probably involve changing shell_quote_scalar_*() to take this