coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] shuf: use reservoir-sampling when possible


From: Pádraig Brady
Subject: Re: [PATCH] shuf: use reservoir-sampling when possible
Date: Mon, 25 Mar 2013 20:11:01 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 03/25/2013 04:30 PM, Assaf Gordon wrote:
> Hello Pádraig,
> 
> Pádraig Brady wrote, On 03/24/2013 11:45 PM:
>>>>>> On 03/06/2013 11:50 PM, Assaf Gordon wrote:
>>>>>>> Attached is a suggestion to implement reservoir-sampling in shuf:
>>>>>>> When the expected output of lines is known, it will not load the entire 
>>>>>>> file into memory - allowing shuffling very large inputs.
>>
>> I've attached 9 patches to adjust things a bit.
>>
> 
> Looks great, thank you very much.
> 
> One minor improvement: the comment in the test file is wrong (in early stages 
> of the patch I thought I could use a fixed random-source and pre-calculate 
> the expected output).
> Attached is a fix.

OK pushed that.

I added a note on how to improve the efficiency of reading
small inputs from a pipe, as that's a fairly invasive change,
and more appropriate for a follow up patch.

thanks!
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]