parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Replacement string for process number


From: Jay Hacker
Subject: Re: Replacement string for process number
Date: Wed, 12 Jan 2011 10:49:09 -0500

On 12/23/10, Ole Tange <ole@tange.dk> wrote:
> On Wed, Dec 22, 2010 at 3:51 PM, Jay Hacker <jayqhacker@gmail.com> wrote:
>> I'd like to be able to use the number of a process in a GNU parallel
>> command.
>
> From what you describe below it is the number of the slot of the
> process. So if you run -P32 you will get at most 32 values, fewer if
> there are fewer than 32 argument.

Correct.

> GNU Parallel cannot do that at the moment.
>
> $PARALLEL_PID and $PARALLEL_SEQ are a bit similar to this.
> ...
> So a short hand for something like:
>
> parallel printf '%02d\\t%s\\n' \$\(\(\$PARALLEL_SEQ%32\)\) ::: *gz |
> parallel --colsep '\t' -P32 scp {2} node{1}:/data
>
> but with the added benefit that if one of the files is big one node
> may only get that single file.

Yes.  The main difference (aside from brevity, which can be a big
one!) is that I'd like to be able to guarantee that only process slot
K writes to file K.  Or in general, only process slot K uses resource
K, providing mutual exclusion.  Correct me if I'm wrong, but with
something like the cat example:

> parallel printf '%02d\\t%s\\n' \$\(\(\$PARALLEL_SEQ%16\)\) :::
> ~/files/*.txt | parallel --colsep '\t' -P16 "cat {2} >>
> output-file{1}.txt"

It seems possible that, say, process 0 gets input 0, process 1 gets
input 16 (because 1-15 finished quickly), and they both write to
output file 0 at the same time, clobbering the output.  That's what
I'm trying to avoid.  I want a number that only gets used by one
process at a time.

> I understand that on the local computer you want {p} to be the job
> slot number with 0's prepended.

Seems most useful, but there could be a switch to toggle zero padding.

> Please describe what {p} will be when some of the jobs are run on
> remote hosts (-S :,server1).

Monotonically increasing numbers across hosts in the order given,
i.e., if host1 has 2 cores, host2 has 1, and host3 has 4, then:

host1 core0: 0
host1 core1: 1
host2 core0: 2
host3 core0: 3
host3 core1: 4
host3 core2: 5
host3 core3: 6

> Please describe what {p} will be when the argument for -P is a
> filename (and thus can be changed during the run). How many 0's should
> be prepended if it is changed from -P9 to -P10?

Hmm, that is a bit messy; but I guess you could either not prepend
zeros in that case, or prepend the number appropriate for when a job
was started.

> Please describe what {p} will be when the command is retried (--retries >
> 1).

I think the most useful would be to assign the command to a process
slot appropriate to the computer on which it's restarted, with the
slot number as described for --retries.

> {n} seems to be what $PARALLEL_SEQ is today.

I had not seen $PARALLEL_SEQ before.  Maybe environment variables
would be the way to go for what I was calling {p} and {P} as well.
More to type, but it would avoid the replacement string issues, and
you seem to indicate the implementation might be easier.

> Also have a look at https://savannah.gnu.org/bugs/?31678. It is a
> feature that would solve your two examples provided that the number of
> arguments fit a single line (because scp and cat can take more than
> one argument).

But this only works for the case where the command accepts multiple
arguments. :(

Thanks!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]