[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] shuf: use reservoir-sampling when possible
From: |
Pádraig Brady |
Subject: |
Re: [PATCH] shuf: use reservoir-sampling when possible |
Date: |
Mon, 25 Mar 2013 20:11:01 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 03/25/2013 04:30 PM, Assaf Gordon wrote:
> Hello Pádraig,
>
> Pádraig Brady wrote, On 03/24/2013 11:45 PM:
>>>>>> On 03/06/2013 11:50 PM, Assaf Gordon wrote:
>>>>>>> Attached is a suggestion to implement reservoir-sampling in shuf:
>>>>>>> When the expected output of lines is known, it will not load the entire
>>>>>>> file into memory - allowing shuffling very large inputs.
>>
>> I've attached 9 patches to adjust things a bit.
>>
>
> Looks great, thank you very much.
>
> One minor improvement: the comment in the test file is wrong (in early stages
> of the patch I thought I could use a fixed random-source and pre-calculate
> the expected output).
> Attached is a fix.
OK pushed that.
I added a note on how to improve the efficiency of reading
small inputs from a pipe, as that's a fairly invasive change,
and more appropriate for a follow up patch.
thanks!
Pádraig.