[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23113: parallel gzip processes trash hard disks, need larger buffers

From: Jim Meyering
Subject: bug#23113: parallel gzip processes trash hard disks, need larger buffers
Date: Sat, 26 Mar 2016 21:17:11 -0700

On Fri, Mar 25, 2016 at 9:57 AM, Chevreux, Bastien
<address@hidden> wrote:
> Hi there,
> I am using gzip 1.6 to compress large files >10 GiB in parallel (Kubuntu 
> 14.04, 12 cores). The underlying disk system (RAID 10) is able to deliver 
> read speeds >1 GB/s (measured with flushed file caches, iostat -mx 1 100).
> Here are some numbers when running gzip in parallel:
> 1 gzip process: the CPU is the bottleneck in compressing things and 
> utilisation is 100%.
> 2 gzips in parallel: the disk throughput drops to a meagre 70MB/s and the CPU 
> utilisation per process is at ~60%.
> 6 gzips in parallel: the disk throughput fluctuates between 50 and 60 MB/s 
> and the CPU utilisation per process is at ~18-20%.
> Running 6 gzips in parallel on the same data residing on a SSD: 100% CPU 
> utilisation per process
> Googling a bit I found this thread on SuperUser where someone saw the same 
> behaviour already with a single disk doing normally 125 MB/s and running 4 
> gzips drops it to 25 MB/s:
> http://superuser.com/questions/599329/why-is-gzip-slow-despite-cpu-and-hard-drive-performance-not-being-maxed-out
> The posts there propose a workaround like this:
>   buffer -s 100000 -m 10000000 -p 100 < bigfile.dat | gzip > bigfile.dat.gz
> And indeed, using "buffer" resolves trashing problems when working on a disk 
> system. However, using "buffer" is pretty arcane (it isn't even installed per 
> default on most Unix/Linux installations) and pretty counterintuitive.
> Would it be possible to have bigger buffers by default (1 MB? 10 MB?) or have 
> an automatism in gzip like "if file to compress >10 MB and free RAM >500MB, 
> setup the file buffer to use 1 (10?) MB" ?
> Alternatively, a command line option to manually set the buffer size?

Thanks for the report and suggestions.
However, I suggest that you consider using xz in place of gzip.
Not only can it compress better, it also works faster for comparable
compression ratios.

That said, if you find that setting gzip.h's INBUFSIZ or OUTBUFSIZ to
larger values makes a significant difference, we'd like to hear about
the results and how you measured.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]