[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
sort: new feature: use environment variable to set buffer size
From: |
Assaf Gordon |
Subject: |
sort: new feature: use environment variable to set buffer size |
Date: |
Wed, 29 Aug 2012 16:50:00 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4 |
Hello,
I'd like to suggest a new feature to sort: the ability to set the buffer size
(-S/--buffer-size X) using an environment variable.
In summary:
$ export SORT_BUFFER_SIZE=20G
$ someprogram | sort -k1,1 > output.txt
# sort will use 20G of RAM, as if "--buffer-size 20G" was specified.
The rational:
recent commits improved the guessed buffer size when sort is given an input
file,
but these don't apply if sort is used as part of a pipe line, with a pipe as
input, e.g.
some | program | sort | other | programs > file
(Tested with v8.19 on linux 2.6.32, sort consumes few MBs of RAM, even though
many GBs are available).
This results in many small temporary files being created.
The script (which uses sort) is not under my direct control, but even if it was,
I don't want to hard-code the amount of memory used, to keep it portable to
different servers.
AFAIK, there are four aspects of sort the affect performance:
1. number of threads:
changeable with "--parallel=X" and with environment variable OMP_NUM_THREADS.
2. temporary files location:
changeable with "--temporary-directory=DIR" and with environment variable
TMPDIR.
3. memory usage:
changeable with "--buffer-size=SIZE" but not with environment variable.
4. compression program:
changeable with "--compression-program=PROG" but not with environment variable.
(but at the moment, I do not address this aspect).
With the attached patch, sort will read an environment variable named
"SORT_BUFFER_SIZE", and will treat it as if "--buffer-size" was specified (but
only if "--buffer-size" wasn't used on the command line).
If this is conceptually acceptable, I'll prepare a proper patch (with NEWS,
help, docs, etc.).
Regards,
-gordon
0001-sort-accept-buffer-size-from-environment-variable.patch
Description: Text Data
- sort: new feature: use environment variable to set buffer size,
Assaf Gordon <=