bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort: Parallel merging


From: Chen Guo
Subject: Re: sort: Parallel merging
Date: Wed, 17 Feb 2010 13:27:18 -0800 (PST)

Hi Shaun,
    Last year someone named Glen Lenker did something like that, and I
think the patch was rejected because the maintainers didn't see enough
speed up as the number of CPUs got higher.

    That said, I've been working on this for a while, and it has become
a group project for a class now where we're allowed to essentially choose
our own projects.

    You're welcome to look at our work, it's on github publicly viewable, just
search cs130coreutils. Feel free to checkout a copy and do whatever to it.

    As for buffer size, I highly doubt using 8 mb, even if we're magically
guaranteed to get 100% of the cpu cache, would work better than a larger
buffer.

    The main reason would be for larger files, you'd have to repeatedly write
temporary files out to disk, then merge those temporary files. Whatever
time you save talking to cache is more than lost to the extra time talking
to disk.


----- Original Message ----
> From: Shaun Jackman <address@hidden>
> To: Coreutils <address@hidden>
> Sent: Wed, February 17, 2010 11:29:35 AM
> Subject: sort: Parallel merging
> 
> Hi,
> 
> Do any patches exist to fork the merging stage of sort and run multiple
> merge processes in parallel? It seems like a relatively straight forward
> improvement, especially since a lot of the fork/wait magic has already
> been tackled by --compress-program. I wonder what the optimal
> --batch-size would be; NMERGE=2 would be the most parallel, but would
> require more I/O.
> 
> Does anyone here know the effect of the CPU cache size on the optimal
> --buffer-size? I was wondering if it's possible that setting it to the
> CPU cache size (say 8 MB) could possible be faster than a larger buffer.
> 
> Cheers,
> Shaun





reply via email to

[Prev in Thread] Current Thread [Next in Thread]