bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: fix for inefficient merges in sort.c


From: Lemley James - jlemle
Subject: RE: fix for inefficient merges in sort.c
Date: Mon, 14 Feb 2005 15:48:09 -0600

Hi Paul, thanks for the good words!   

That's an elegant solution - starting probe at 1 instead of in the middle,
to get the case where the key we want is already in the first file.  I like
it.  I had set up a test on my machine to try lots of different scenarios
with the original sort.c and my modified version, and run it over-night.
I'll do the same with the patch below and I'll let you know the results.

As to making NMERGE a command line option and/or a dynamically set value, I
love that idea.  I don't know if there are still systems in use where the
number of open files allowed is low, so I would be conservative with any new
setting unless it is obvious the system is a large one, which leads me to
the next topic:

Regarding large NMERGE and a good heuristic:  I found that performance
suffers as each file's buffer area goes down - so either too small -S on the
command line, or too large NMERGE, and seeking between merge files becomes a
bottleneck(*).  In practice on one particular machine, seeking became a
pretty big bottleneck when there was less than 512K of buffer memory for
each file, and it really worked best with 1MB or more per merge file.  I was
using -S 1G with NMERGE=1024, sorting a 100GB file, and it ran very well
indeed.   I am sure it will be machine dependent, but making NMERGE be
roughly the number of megabytes in the merge buffer, for merge buffer sizes
>= 16, would be a good start I think.  That would least-surprise most people
who don't set -S or the new NMERGE parameter, and pleasantly surprise folks
who have gobs of RAM and can probably also support lots of open files for
reduced merge passes.  Some testing is in order; it could be that 512K or
2MB would be the sweet spot, but I would think that would be a good way to
set NMERGE if not specified. 

(*) note:  I assume it was seeking.  it could have been the number of system
calls to read() and write() went way up, and that was the overhead.  Either
way, it was badness. 

--James




**********************************************************************
The information contained in this communication is
confidential, is intended only for the use of the recipient
named above, and may be legally privileged.
If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication is strictly
prohibited.
If you have received this communication in error,
please re-send this communication to the sender and
delete the original message or any copy of it from your
computer system. Thank You.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]