[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort default buffer size
From: |
Jim Meyering |
Subject: |
Re: sort default buffer size |
Date: |
Fri, 17 Jul 2009 06:41:25 +0200 |
Pádraig Brady wrote:
> I was surprised to notice sort was accessing the disk on multiple runs on
> a 500MB file on my 2GB RAM laptop. Here was my memory situation:
>
> $ free -m | head -n2
> total used free shared buffers cached
> Mem: 2006 603 1403 0 67 404
> $ cat 500MB_access_log > /dev/null
> $ free -m | head -n2
> total used free shared buffers cached
> Mem: 2006 1095 911 0 67 895
>
> So on subsequent runs I had 911MB free but I noticed sort was only using
> around half that. In fact looking at the code it was using:
>
> buf_size = MIN(rlimit, MAX(free, total/8))/2
>
> This seems a bit conservative to me especially as when RAM sizes are
> increasing then more will tend to be dedicated to cache, and thus safer
> to use. In fact my case is a little unusual as I had just booted.
> The usual case is for free to tend to 0 over time as more files are cached.
> In other words, the rlimits are more important to stay away from than the
> other "limits". So might this be better?
[Oh! just discovered this partially-written reply.
Was interrupted and almost never made it back.
Sorry about that. ]
The default is intended to be conservative, e.g, in case multiple
invocations of sort happen to run in parallel, or in a multi-user environment.
> buf_size = MIN(rlimit/2, MAX(free, total/8))
>
> I also noticed that the code in default_sort_size() assumes the
> rlimit values are unsigned which may cause portability issues?
>
> Note the "used" value seen in the above output from `free` is
> not used in the equation at present.
>
> p.s. while testing this I noticed that sort from git with default CFLAGS
> is about 14% faster than sort from coreutils-7.2 that ships with F11.
Definitely worth investigating.
> Nothing has changed in the sort code as far as I can see, and
> also the compiler and glibc were the same.
>
> $ export LANG=C
> time sort -t ' ' -k4.9n -k4.5M -k4.2n -k4.14,4 --buffer-size=1G access_log >
> /dev/null
>
> real 0m28.631s
> user 0m26.866s
> sys 0m1.354s
>
> $ time ~/git/coreutils/src/sort -t ' ' -k4.9n -k4.5M -k4.2n -k4.14,4
> --buffer-size=1G access_log > /dev/null
>
> real 0m24.199s
> user 0m22.707s
> sys 0m1.370s
>
> I first suspected compiler flags, however recompiling sort.o
> as follows, does not make a difference:
> $ rm sort.o && make CFLAGS="$(rpm -q --qf %{OPTFLAGS} coreutils)" V=1
>
> So I'm now guessing the i18n patch is affecting the speed even though LANG=C
>
> p.p.s recompiling all of coreutils with the above rpm flags, fails with
> warnings like:
> cp.c:358: error: not protecting local variables: variable length buffer
> [-Wstack-protector]
> due to the ASSIGN_STRDUPA macro.
>
>
> _______________________________________________
> Bug-coreutils mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/bug-coreutils