bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort: Parallel merging


From: Shaun Jackman
Subject: Re: sort: Parallel merging
Date: Wed, 17 Feb 2010 16:20:17 -0800

Hi Chen,

On Wed, 2010-02-17 at 15:50 -0800, Chen Guo wrote:
> > I'm suggesting setting the buffer size to the size of the CPU cache; the
> > sort process has 100% CPU affinity, i.e. no other processes allowed on
> > that CPU and so exclusive use of the data cache; and the temporary
> > directory is mounted on RAM (i.e. tmpfs) and not magnetic disk.
> > 
> > sort --buffer-size=8M --temporary-directory=/dev/shm
> > 
> > If the merging is parallel, under these circumstances, is it possible
> > that --buffer-size=8M could be faster than a larger value.
> 
> Ahhh I see what you're saying.
> 
> First, assuming 8M CPU cache, we can actually specify buffer-size > 8M.
> The reason is, we read in lines from the input file, and construct structs
> associated with those lines. Both the original lines and these structs live
> in the buffer, but we're only sorting the structs. So really, we can specify
> buffer-size to be bigger as the actual lines dont need to be in cache.

Comparing the structs requires comparing the actual lines, right? Don't
we want the actual lines in cache?

> But back to what you're saying... Under this approach, you'd sort as much
> lines as fit in 8M at a time, spit them to tmpfiles in RAM, then perform
> n-way merges on those files, correct?

Yes.

> If you can somehow guarantee that the cache is filled with the structs we
> need, then we can assume no delays due to page faults (at least until the
> nway merge), so that's a win.
> 
> However, as I recently found out, when doing an n-way merge, that 
> actually uses more compares than using binary merging. That's to say,
> merging 4 files, the 3 binary merges it takes to merge them actually takes
> less compares than the 1 4-way merge.

Very good point. Let's set NMERGE to 2 then, which also allows for the
maximum parallelism:
sort --buffer-size=8M --temporary-directory=/dev/shm --batch-size=2

> This is really a long winded way of saying "I don't know," but intuitively
> I'd say the downsides outweigh the benefits of restricting buffer to cache
> size, if only because of the compares. Another reason might be, it's not like
> currently cache isn't used at all; as work is done on the struct lines
> they are naturally brought into cache of whatever processor is working on
> them.

I don't know either, but once I get your parallel merging patch running,
I'll do some benchmarks. Anybody know the magic incantation to lock a
particular process onto a particular CPU, preventing any other processes
from running on that CPU?

Cheers,
Shaun






reply via email to

[Prev in Thread] Current Thread [Next in Thread]