bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: fixing the 32-bit size and time limits in gzip file format


From: Mark Adler
Subject: Re: RFC: fixing the 32-bit size and time limits in gzip file format
Date: Mon, 16 Aug 2010 22:40:37 -0700

All,

The format of the extra field at the end could be similar to the one in the 
header but with smaller sizes and fewer id's:

n (n == 0 permitted) occurrences of:

   1-byte sub-field id, 1-byte length, then that many bytes

followed by:

   1-byte end-of-extra-field id, 1-byte total length of extra field including 
following crc, 2-byte crc of entire extra field except of course the crc.

Putting the total length of the extra field at the end permits finding the 
beginning of the extra field by starting at the end of the file.  The crc (as 
well as the structure of the extra field) permits verification that it really 
is an extra field at the end, as opposed to garbage at the end or a 
concatenated gzip stream without an extra field.

Sub-field id 00 would be the uncompressed length, with a variable number of 
bytes in little-endian format.

There is another problem that could be solved with this, which is the inability 
to know about concatenated gzip streams in a file without decompressing.  
Another sub-field in the extra field at the end could be the number of bytes 
back to the start of the current gzip stream.  Then you could step back through 
the headers and trailers of all of the gzip streams and find out what the 
uncompressed length *really* would be.

So sub-field id 01 would be the compressed length, again in a variable number 
of bytes, where the length includes the header and trailer but does not include 
the extra field at the end.  I.e. the number of bytes back from the start of 
the entire extra field to the start of the header.

Mark




reply via email to

[Prev in Thread] Current Thread [Next in Thread]