[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort: Parallel merging
From: |
Shaun Jackman |
Subject: |
Re: sort: Parallel merging |
Date: |
Wed, 17 Feb 2010 15:16:49 -0800 |
On Wed, 2010-02-17 at 14:57 -0800, Chen Guo wrote:
> > > As for buffer size, I highly doubt using 8 mb, even if we're magically
> > > guaranteed to get 100% of the cpu cache, would work better than a larger
> > > buffer.
> > >
> > > The main reason would be for larger files, you'd have to repeatedly
> > > write
> > > temporary files out to disk, then merge those temporary files. Whatever
> > > time you save talking to cache is more than lost to the extra time talking
> > > to disk.
> >
> > What if the temporary files were stored in RAM (i.e. tmpfs) rather than
> > on magnetic disk?
>
> I think I'm misunderstanding what you're trying to say... But the file stored
> in ram would be in a buffer. --buffer-size sets the size of this buffer, i.e.
> how
> much space in RAM you want to allocate to sort.
I'm suggesting setting the buffer size to the size of the CPU cache; the
sort process has 100% CPU affinity, i.e. no other processes allowed on
that CPU and so exclusive use of the data cache; and the temporary
directory is mounted on RAM (i.e. tmpfs) and not magnetic disk.
sort --buffer-size=8M --temporary-directory=/dev/shm
If the merging is parallel, under these circumstances, is it possible
that --buffer-size=8M could be faster than a larger value.
Cheers,
Shaun