[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gzip --force bug

From: Mark Adler
Subject: gzip --force bug
Date: Tue, 2 Feb 2010 08:21:54 -0800


I got a report of a behavior of gzip that is not replicated in pigz.  In the 
process of investigating that, I found a bug in gzip (all versions including 
1.4).  Here's the deal.

The behavior is that if you use --force and --stdout with --decompress, gzip 
will behave like cat if it doesn't recognize any compressed data magic headers. 
 This is so that zcat can act as a replacement for cat, automatically detecting 
and decompressing compressed data.  (pigz doesn't currently do that, which I 
need to fix.)  Another behavior of gzip is that it will decompress concatenated 
gzip streams.  Combining those two behaviors, gzip -cfd on a gzip stream 
followed by non-gzip data should give you the decompressed data from the stream 
followed by the non-gzip data copied.

gzip doesn't do that, at least not correctly.

What it does for a small example is write the decompressed data, write the 
initial gzip stream without decompressing it (!), and then write the non-gzip 
data.  The stuff in the middle is the result of this code in gzip.c:

   } else if (force && to_stdout && !list) { /* pass input unchanged */
        method = STORED;
        work = copy;
       inptr = 0;
        last_member = 1;

(By the way, the tabs should be removed from all of the gzip source code.)

The culprit is the "inptr = 0".  It resets the input back to the beginning of 
the current input buffer (wherever that happens to be) and copies from there.  
That works fine if you start the input with non-gzip data, but messes up in the 
case of non-gzip data after a gzip stream.

I have not developed a fix, since it is non-trivial.  You can't just restore a 
saved inptr, since it is possible for the two-byte magic header to be split on 
a buffer boundary.  That is, reading the first byte of the magic header empties 
the input buffer, so that reading the second byte of the magic reader fills the 
input buffer, overwriting the first byte.

If you want, I can try to come up with a patch for that, or you could have that 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]