coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sort: new feature: use environment variable to set buffer size


From: Assaf Gordon
Subject: sort: new feature: use environment variable to set buffer size
Date: Wed, 29 Aug 2012 16:50:00 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4

Hello,

I'd like to suggest a new feature to sort: the ability to set the buffer size 
(-S/--buffer-size X) using an environment variable.

In summary:
 $ export SORT_BUFFER_SIZE=20G 
 $ someprogram | sort -k1,1 > output.txt
 # sort will use 20G of RAM, as if "--buffer-size 20G" was specified.


The rational:
recent commits improved the guessed buffer size when sort is given an input 
file,
but these don't apply if sort is used as part of a pipe line, with a pipe as 
input, e.g.
  some | program | sort | other | programs > file 

(Tested with v8.19 on linux 2.6.32, sort consumes few MBs of RAM, even though 
many GBs are available).
This results in many small temporary files being created.

The script (which uses sort) is not under my direct control, but even if it was,
I don't want to hard-code the amount of memory used, to keep it portable to 
different servers.

AFAIK, there are four aspects of sort the affect performance:
1. number of threads:
changeable with "--parallel=X" and with environment variable OMP_NUM_THREADS.

2. temporary files location:
changeable with "--temporary-directory=DIR" and with environment variable 
TMPDIR.

3. memory usage:
changeable with "--buffer-size=SIZE" but not with environment variable.

4. compression program:
changeable with "--compression-program=PROG" but not with environment variable.
(but at the moment, I do not address this aspect).


With the attached patch, sort will read an environment variable named 
"SORT_BUFFER_SIZE", and will treat it as if "--buffer-size" was specified (but 
only if "--buffer-size" wasn't used on the command line).

If this is conceptually acceptable, I'll prepare a proper patch (with NEWS, 
help, docs, etc.).

Regards,
 -gordon

Attachment: 0001-sort-accept-buffer-size-from-environment-variable.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]