[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sort default buffer size

From: Pádraig Brady
Subject: sort default buffer size
Date: Tue, 7 Jul 2009 11:54:08 +0100
User-agent: Thunderbird (X11/20071008)

I was surprised to notice sort was accessing the disk on multiple runs on
a 500MB file on my 2GB RAM laptop. Here was my memory situation:

$ free -m | head -n2
             total       used       free     shared    buffers     cached
Mem:          2006        603       1403          0         67        404
$ cat 500MB_access_log > /dev/null
$ free -m | head -n2
             total       used       free     shared    buffers     cached
Mem:          2006       1095        911          0         67        895

So on subsequent runs I had 911MB free but I noticed sort was only using
around half that. In fact looking at the code it was using:

buf_size = MIN(rlimit, MAX(free, total/8))/2

This seems a bit conservative to me especially as when RAM sizes are
increasing then more will tend to be dedicated to cache, and thus safer
to use. In fact my case is a little unusual as I had just booted.
The usual case is for free to tend to 0 over time as more files are cached.
In other words, the rlimits are more important to stay away from than the
other "limits". So might this be better?

buf_size = MIN(rlimit/2, MAX(free, total/8))

I also noticed that the code in default_sort_size() assumes the
rlimit values are unsigned which may cause portability issues?

Note the "used" value seen in the above output from `free` is
not used in the equation at present.

p.s. while testing this I noticed that sort from git with default CFLAGS
is about 14% faster than sort from coreutils-7.2 that ships with F11.
Nothing has changed in the sort code as far as I can see, and
also the compiler and glibc were the same.

$ export LANG=C
time sort -t ' ' -k4.9n -k4.5M -k4.2n -k4.14,4 --buffer-size=1G access_log > 

real    0m28.631s
user    0m26.866s
sys     0m1.354s

$ time ~/git/coreutils/src/sort -t ' ' -k4.9n -k4.5M -k4.2n -k4.14,4 
--buffer-size=1G access_log > /dev/null

real    0m24.199s
user    0m22.707s
sys     0m1.370s

I first suspected compiler flags, however recompiling sort.o
as follows, does not make a difference:
$ rm sort.o && make CFLAGS="$(rpm -q --qf %{OPTFLAGS} coreutils)" V=1

So I'm now guessing the i18n patch is affecting the speed even though LANG=C

p.p.s recompiling all of coreutils with the above rpm flags, fails with 
warnings like:
cp.c:358: error: not protecting local variables: variable length buffer 
due to the ASSIGN_STRDUPA macro.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]