Re: feature request: gzip/bzip support for sort

From: Paul Eggert
Date: Sat, 13 Jan 2007 22:07:59 -0800
Thanks.  I like the idea of compression, but before we get into the
details of your patch, what do you mean by there not being a
performance improvement with this patch?  What's the holdup on
performance?  It seems to me that compression ought to be a real win.

If it's not a win, we shouldn't bother with LZO; instead, we should
use an algorithm that will typically be a clear win -- or, if there
isn't any such algorithm, we shouldn't hardware any algorithm at all.

Some other thoughts.

1.  Have you looked into other compression algorithms?  QuickLZ
<http://www.quicklz.com/> should compress fairly quickly, if
compression speed is the bottleneck.  I mention QuickLZ because it's
currently the compression-speed champ at

2.  For the default, let's not bother with a user-visible option.
Let's just use compression, if a good algorithm is available.

3.  I can see where the user might be able to specify a better
algorithm, for a particular data set.  For that, how about if we have
a --compress-program=PROGRAM option, which lets the user plug in any
program that works as a pipeline?  E.g., --compress-program=gzip would
use gzip.  The default would be to use "PROGRAM -d" to decompress; we
could have another option if that doesn't suffice.

An advantage of (3) is that it should work well on two-processor
hosts, since compression can be done in one CPU while sorting is done
on another.  (Hmm, perhaps we should consider forking even if we use a
built-in default compressor, for the same reason.)

