A suggestion: --shuf and -k

paralleluser
Subject: Re: GNU Parallel Bug Reports A suggestion: --shuf and -k
Fri, 30 Jun 2017 17:58:31 -0400

True, if the input names are easily sortable.  They are in the example I 
proposed, but in my real life example they are not easily sortable.  With your 
sort idea, you could throw on a "--tag" and then sort the output.

I use the -k to paste data, as an ordered vector, back into Excel or 
Matlab/Octave/R.  So yes, the order can be executed arbitrarily, but to keep 
the vector indices in the same order, you need a -k or a sort (if your data 
allows that, or if you can force it to be so).  The only "order of sort rule" 
is whatever you have in your pre-defined matrix.  If you can force it to be 
ASCII order, yes sort works.  If you cannot or doing so would be a pain, then 
-k has value, I think.

On Fri, Jun 30, 2017, at 05:46 PM, Rob Sargent wrote:
> -1 
> (If jobs can be started independent of order, so too is the analysis of the 
> output. From your description, the problem is solved with a call to sort.)
On Jun 30, 2017, at 3:11 PM, paralleluser wrote:
> > 
> > Friends
> > 
> > A suggestion that merits your comments and review:
> > 
> > --shuf does exactly what the man page says it does, but when you combine 
> > --shuf and -k, the -k does nothing, --shuf rules over -k
> > 
> > I'm going to propose that combining --shuf and -k that this happens:
> > 
> >     the jobs are still processed randomly
> >     but the output be in order as the true input
> >     
> > When do you use this?  Assume your input to parallel process is:
> > 
> >     server1/resource1
> >     server1/resource2
> >     server1/resource3
> >     ...etc...
> >     server2/resource1
> >     server2/resource2
> >     ...etc...
> >     ...up to server50
> >             
> > For human processing reasons, it is easier to keep all the server/resource 
> > input lines in ASCII sort order
> > 
> > For computer processing reasons, server 1 is going to hate you if you are 
> > hitting it with a lot of requests all at the same time
> > 
> > Thus with the "--shuf -k" combo, the sever loads will be spread around, but 
> > you will get your data back in the same order.
> > 
> > Comments welcome........thanks
> > 

