[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports why is parallel invoking a shell **by defau

From: Ole Tange
Subject: Re: GNU Parallel Bug Reports why is parallel invoking a shell **by default** and associated bugs
Date: Mon, 25 May 2015 22:08:03 +0200

On Sat, May 23, 2015 at 10:50 PM, Stephane Chazelas
<address@hidden> wrote:
> Hello,
> When you do:
> seq 10 | parallel cmd
> It looks like parallel goes all the trouble of spawning a shell,
> (taking extra trouble deciding which one to use),


> build a command
> line that looks alright for that shell with the cmd and
> arguments on stdin properly quoted.
> Why?

For consistency. I prefer that 'parallel cmd {} ::: 1' does very close
to the same as 'parallel cmd {} ">" {} ::: 1'

The only situation in which you do not not have to spawn a shell is:

* If there is only 1 command (i.e. no ; && || () or |)
* If there is no redirection (i.e. no > < or |)
* If GNU Parallel does not generate any wrapping, which is
increasingly common these days.

> Most of the time, when you do:
> seq 10 | parallel cmd
> You intend it to run ["cmd", "1"], ["cmd", "2"]... in parallel,
> not ["myshell", "-c", "cmd 1"], ["myshell", "-c", "cmd 2"],
> hoping that the shell will eventually (after initialisation,
> loading libraries, startup files...) run ["cmd", "1"]...
> I understand, there are cases where you may want to use shell
> constructs in there, but that's not the common case.

Are you aware that your commands are often wrapped? Any of these
options will wrap your command:

    # * --shellquote
    # * --nice
    # * --cat
    # * --fifo
    # * --sshlogin
    # * --pipepart
    # * --pipe
    # * --tmux

> I'd be more than happy to do things like:
> seq 10 | parallel sh -c 'cmd "$1" > "$1.out"' sh
> for instance

Changing the current behaviour to the above would break backward compatibility.

And the above is actually one of the reasons why I wrote GNU Parallel:
You are _forced_ to use the above syntax when using xargs. I use
composed commands and redirection all the time and would find it very
annoying if I had to quote the commands and then wrap the commands
with a shell.

> At the moment, depending on the shell (and it's not always clear
> which one you'll get)

Please read: 

If it is still not clear then please explain a situation which is not
covered by that.

> there are a few bugs.

And bugs should be fixed.

> For instance with zsh:
> $ printf '=z\n'  | PARALLEL_SHELL=zsh parallel  'printf "<%s>\n"'
> zsh:1: z not found
> In zsh, a leading = is a globbing operator that is not currently
> escaped by parallel.

I did not know that. It seems there is no problem in quoting = in
bash, tcsh and csh, so I have now added = to the characters that
should be \'ed.

> With csh/tcsh:
> $ printf 'a\nb\0'  | PARALLEL_SHELL=tcsh parallel -0 'printf "<%s>\n"'
> Unmatched '.
> Unmatched '.

That is now fixed.

> (good luck to get the quoting right with csh)

Yep, that is a bitch and it seems it cannot be done for every case.

> With rc/es/akanga:
> $ printf "'"  | PARALLEL_SHELL=rc parallel -0 "printf '<%s>\n'"
> line 1: eof in quoted string near eof
> rc/es have only one kind of quotes: single quotes

I have never worked on a system with rc/es/akanga, so it is very
likely that GNU Parallel has bugs for these shells.

> Even with POSIX shells, the quoting will only be right in the
> main context.:
> ~$ printf '%s\n' '\' '\x' | PARALLEL_SHELL=sh parallel  'printf "<%s>\n" 
> "`printf \"<%s>\n\" {}`"'
> <<\>>
> <<x>>

This is the expected result.

> ~$ printf '%s\n' '\' '\x' | PARALLEL_SHELL=bash parallel 'printf "<%s>\n" 
> "`printf \"<%s>\n\" {}`"'
> <<>>
> <<x>>
> (inside backtick, you need another level of escaping for \ (and
> `, $, ")).

It works fine for characters that do not need quoting and if you ran
the command without parallel it also would not work:

  $ printf "<%s>\n" "`printf \"<%s>\n\" \\`"

so the user would already be aware that he needed to do something
special about the arguments. With GNU Parallel you can:

printf '%s\n' '\' '\x' | PARALLEL_SHELL=bash parallel 'printf "<%s>\n"
"`printf \"<%s>\n\" {=$_=::shell_quote_scalar($_)=}`"'

It is a bit long to write, so Q() is now an alias for
::shell_quote_scalar(), so the syntax in the future can be:

printf '%s\n' '\' '\x' | PARALLEL_SHELL=bash parallel 'printf "<%s>\n"
"`printf \"<%s>\n\" {=$_=Q($_)=}`"'

> There are of course contexts that parallel can't always get
> right like typeset -A a; a[{1}]=1... (though of course one can
> find work-arounds).

Let us find these contexts that work without GNU Parallel and document
them . Your backtick example above does not qualify, because if the
user runs the command outside GNU Parallel it also will not work. It
has to be examples that start to fail when prepending 'parallel'.

> There's the problem of empty arguments:
> $ printf '%s\n' 1 2 3 a '' c A B C | PARALLEL_SHELL=sh parallel -n3 'f() { 
> IFS=,; echo "$# $*";}; f'
> 3 1,2,3
> 2 a,c
> 3 A,B,C

Yep. This is a harder nut to crack.

The reason for the current behaviour is to support people who forget
that {} should be evaluated in the main shell context.

But it might be reasonable to simply quote the empty string with ''.
It will be consistent with the rule that {} should always be evaluated
by the shell. I have now implemented that and it means that a few
tests now differ from xargs:

  echo | xargs -I {} echo {} a
  echo | parallel echo {} a

I think we can live with that.

> While it works in most of the common cases, that means a
> significant overhead (when the whole point of using a
> "parallel" command is to improve performance) for little benefit
> (see echo x | strace -fe process parallel echo for instance
> compared to echo x | strace -fe process xargs -P4 echo).

This is true. But GNU Parallel will never be as fast as xargs at
spawning programs, so that battle is already lost. If the runtime of
your jobs are in the millisecond range and performance is key, then
GNU Parallel is probably not the right tool for the job. It can
somewhat be alleviated with the 'parallel --pipe -k parallel' trick,
but it is by no means optimal.

As you can see there is clearly a trend towards being slower:

The reason is not only the wrapping scripts, but also that GNU
Parallel does more testing - e.g. is the disk full of $TMPDIR? That
would mean that the output could be incomplete and is thus unreliable.

Most of the time is spent in open3, so I am thinking of a way in which
open3 can be parallelized to pre-spawn worker processes.

> And that means when you want reliability even in corner cases,
> you need to double-guess what "parallel" will do and try to
> outsmart it which defeats the whole thing.

If {} is evaluated in the main shell context it should always evaluate
to the input string. If it does not, it should either be fixed or at
the very least well documented.

> Sorry, that was a lot of ranting, and not much constructive in
> there.

Ranting (when properly distilled) is useful criticism.

> Now, to sum up, I'd say there are a few things that can
> be corrected without much effort like:
> - escape that = for zsh


> - document that shells of the rc family are not supported

I have instead included '-quoting:

printf '"#&/\n()*=?'"'"
printf '"#&/\n()*=?'"'" | PARALLEL_SHELL=rc parallel -0 echo

> - document that multiline arguments are not supported with
>   csh/tcsh (no point in using -0 with csh)

I have instead included \\\n-quoting:

printf '"#&/\n()*=?'"'"
printf '"#&/\n()*=?'"'" | PARALLEL_SHELL=csh parallel -0 echo

> - escape empty arguments as ''


> - document limitations when using {} in some shell contexts.

I need help with that.

> A feature that I would really welcome would be some --no-shell
> option that skips all that business about running a shell and
> building a correct command-line for it and just executes the
> command (a bit like xargs).

If I receive a patch that does this, I will not rule this out.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]