bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature request: gzip/bzip support for sort


From: Jim Meyering
Subject: Re: feature request: gzip/bzip support for sort
Date: Tue, 16 Jan 2007 13:20:16 +0100

Dan Hipschman <address@hidden> wrote:
> Here's the patch for comments.  Thanks,

I tried it and did some timings.
Bottom line: with a 4+GB file, dual-processor, I see a 19% speed-up,
but I think most of the savings is in reduced I/O.

--------------------------------------------
virtually no difference (~5%) for a file of size 324M, created like this:
running on a uniprocessor amd-64 3400:

  $ seq 99999 > k
  $ cat k k k k k k k k k k k k k k k k k k k k k k k k > j
  $ mv j k
  $ cat k k k k k k k k k k k k k k k k k k k k k k k k > j
  $ shuf < j > sort-in

  $ /usr/bin/time ./sort --compress=gzip < sort-in > out
  100.11user 4.69system 1:48.67elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (1major+761524minor)pagefaults 0swaps
  $ /usr/bin/time ./sort < sort-in > out
  93.16user 3.35system 1:40.35elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+137435minor)pagefaults 0swaps

-----------------------------------------
Trying similar, but with a 4.2GB file created like this,
running on a dual-processor with 2GB of RAM.

  $ seq 9999999 > k                            10M lines / 78888888 bytes
  $ cat k k k k k k k k k > j                  90M lines
  $ cat j j j j j j j > k                     630M lines / 4.62 GB
  $ mv k sort-in

What does /tmp look like, after a few minutes?

  $ du -sh /tmp/sort*
  11M     /tmp/sort0Sdnkk
  11M     /tmp/sort6cTaAE
  11M     /tmp/sortABogDY
  ...
----------- contrast with sizes during the run w/no compression:
  216M    /tmp/sort5NTqQu
  216M    /tmp/sortAjx50R
  216M    /tmp/sortKvyGIT
  ...

$ /usr/bin/time ./sort -T /tmp --compress=gzip < sort-in > out
1535.33user 71.76system 27:15.72elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+4985207minor)pagefaults 0swaps
$ /usr/bin/time ./sort -T /tmp < sort-in > out

$ /usr/bin/time ./sort -T /tmp < sort-in > out
./sort: write failed: /tmp/sortieA1nv: No space left on device
Command exited with non-zero status 2
588.79user 17.76system 17:20.70elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2major+225191minor)pagefaults 0swaps
[Exit 2]
$ df -hT /tmp
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda3 reiserfs     12G  6.4G  5.4G  55% /

$ /usr/bin/time ./sort -T . < sort-in > out
754.03user 38.99system 33:42.35elapsed 39%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2major+210437minor)pagefaults 0swaps

So, with just one trial each, I see a 19% speed-up.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]