[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Replacement string for process number
From: |
Jay Hacker |
Subject: |
Re: Replacement string for process number |
Date: |
Wed, 12 Jan 2011 10:49:09 -0500 |
On 12/23/10, Ole Tange <ole@tange.dk> wrote:
> On Wed, Dec 22, 2010 at 3:51 PM, Jay Hacker <jayqhacker@gmail.com> wrote:
>> I'd like to be able to use the number of a process in a GNU parallel
>> command.
>
> From what you describe below it is the number of the slot of the
> process. So if you run -P32 you will get at most 32 values, fewer if
> there are fewer than 32 argument.
Correct.
> GNU Parallel cannot do that at the moment.
>
> $PARALLEL_PID and $PARALLEL_SEQ are a bit similar to this.
> ...
> So a short hand for something like:
>
> parallel printf '%02d\\t%s\\n' \$\(\(\$PARALLEL_SEQ%32\)\) ::: *gz |
> parallel --colsep '\t' -P32 scp {2} node{1}:/data
>
> but with the added benefit that if one of the files is big one node
> may only get that single file.
Yes. The main difference (aside from brevity, which can be a big
one!) is that I'd like to be able to guarantee that only process slot
K writes to file K. Or in general, only process slot K uses resource
K, providing mutual exclusion. Correct me if I'm wrong, but with
something like the cat example:
> parallel printf '%02d\\t%s\\n' \$\(\(\$PARALLEL_SEQ%16\)\) :::
> ~/files/*.txt | parallel --colsep '\t' -P16 "cat {2} >>
> output-file{1}.txt"
It seems possible that, say, process 0 gets input 0, process 1 gets
input 16 (because 1-15 finished quickly), and they both write to
output file 0 at the same time, clobbering the output. That's what
I'm trying to avoid. I want a number that only gets used by one
process at a time.
> I understand that on the local computer you want {p} to be the job
> slot number with 0's prepended.
Seems most useful, but there could be a switch to toggle zero padding.
> Please describe what {p} will be when some of the jobs are run on
> remote hosts (-S :,server1).
Monotonically increasing numbers across hosts in the order given,
i.e., if host1 has 2 cores, host2 has 1, and host3 has 4, then:
host1 core0: 0
host1 core1: 1
host2 core0: 2
host3 core0: 3
host3 core1: 4
host3 core2: 5
host3 core3: 6
> Please describe what {p} will be when the argument for -P is a
> filename (and thus can be changed during the run). How many 0's should
> be prepended if it is changed from -P9 to -P10?
Hmm, that is a bit messy; but I guess you could either not prepend
zeros in that case, or prepend the number appropriate for when a job
was started.
> Please describe what {p} will be when the command is retried (--retries >
> 1).
I think the most useful would be to assign the command to a process
slot appropriate to the computer on which it's restarted, with the
slot number as described for --retries.
> {n} seems to be what $PARALLEL_SEQ is today.
I had not seen $PARALLEL_SEQ before. Maybe environment variables
would be the way to go for what I was calling {p} and {P} as well.
More to type, but it would avoid the replacement string issues, and
you seem to indicate the implementation might be easier.
> Also have a look at https://savannah.gnu.org/bugs/?31678. It is a
> feature that would solve your two examples provided that the number of
> arguments fit a single line (because scp and cat can take more than
> one argument).
But this only works for the case where the command accepts multiple
arguments. :(
Thanks!
- Re: Replacement string for process number,
Jay Hacker <=