[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports A suggestion: --shuf and -k

From: Rob Sargent
Subject: Re: GNU Parallel Bug Reports A suggestion: --shuf and -k
Date: Fri, 30 Jun 2017 17:26:02 -0600

I can’t see what you’re actually doing of course, but if there’s something 
about the order of processing of the output records, those records had best 
hold explicitly their intended order or at least have data by which to define 
the order.
If you’re just trying to match the line of output back to a specific input, 
send that along in your processing.  You’re asking for trouble otherwise 
(dropped records etc)

> On Jun 30, 2017, at 3:58 PM, paralleluser <address@hidden> wrote:
> True, if the input names are easily sortable.  They are in the example I 
> proposed, but in my real life example they are not easily sortable.  With 
> your sort idea, you could throw on a "--tag" and then sort the output.
> I use the -k to paste data, as an ordered vector, back into Excel or 
> Matlab/Octave/R.  So yes, the order can be executed arbitrarily, but to keep 
> the vector indices in the same order, you need a -k or a sort (if your data 
> allows that, or if you can force it to be so).  The only "order of sort rule" 
> is whatever you have in your pre-defined matrix.  If you can force it to be 
> ASCII order, yes sort works.  If you cannot or doing so would be a pain, then 
> -k has value, I think.
> On Fri, Jun 30, 2017, at 05:46 PM, Rob Sargent wrote:
>> -1 
>> (If jobs can be started independent of order, so too is the analysis of the 
>> output. From your description, the problem is solved with a call to sort.)
>>> On Jun 30, 2017, at 3:11 PM, paralleluser <address@hidden> wrote:
>>> Friends
>>> A suggestion that merits your comments and review:
>>> --shuf does exactly what the man page says it does, but when you combine 
>>> --shuf and -k, the -k does nothing, --shuf rules over -k
>>> I'm going to propose that combining --shuf and -k that this happens:
>>>     the jobs are still processed randomly
>>>     but the output be in order as the true input
>>> When do you use this?  Assume your input to parallel process is:
>>>     server1/resource1
>>>     server1/resource2
>>>     server1/resource3
>>>     ...etc...
>>>     server2/resource1
>>>     server2/resource2
>>>     ...etc...
>>>     ...up to server50
>>> For human processing reasons, it is easier to keep all the server/resource 
>>> input lines in ASCII sort order
>>> For computer processing reasons, server 1 is going to hate you if you are 
>>> hitting it with a lot of requests all at the same time
>>> Thus with the "--shuf -k" combo, the sever loads will be spread around, but 
>>> you will get your data back in the same order.
>>> Comments welcome........thanks

reply via email to

[Prev in Thread] Current Thread [Next in Thread]