coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort --parallel


From: Pádraig Brady
Subject: Re: sort --parallel
Date: Thu, 18 Aug 2011 14:41:33 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0

On 08/18/2011 01:43 AM, Nathan Watson-Haigh wrote:
> I’ve just discovered that a more recent version of coreutils (8.12) than the 
> one I currently have installed (5.97) has the --parallel option. However, 
> when I try to sort a large file I don’t see any speedup when using 
> --parallel=8 over --parallel=1. In addition, I only see < 100% CPU usage. I’m 
> on a 32 core system with 128GB RAM and would like to sort a stream consisting 
> of several 100million lines in a smaller amount of time.
> 
>  
> 
> I’m also investigating GNU parallel, any comments on pros/cons of each? E.g. 
> does GNU sort parallelise the merge part? My limited experience with GNU 
> parallel is that it only parallelises the sort but then a single thread is 
> used to do the merge across all the smaller sorted files.

It sounds like your bottleneck is the disk?
You might try to split your input to separate devices first.
Note recent versions of split have the --number=l/N option
which might help here (N is the number of chunks you want
to split to).

cheers,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]