bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: spec compliance: header CRC?


From: Greg Roelofs
Subject: Re: spec compliance: header CRC?
Date: Mon, 5 Jul 2010 09:42:54 -0700

Hi Paul,

>> Are there any plans to address this?

> Not until you mentioned it, but I just now installed a patch for this;
> please see the end of this message.

Awesome!  Many thanks.

> Can you please help out by supplying some test cases?

I can certainly provide one, currently part of a not-quite-final patch at

        https://issues.apache.org/jira/browse/HADOOP-6835
        
http://issues.apache.org/jira/secure/attachment/12448469/HADOOP-6835.v5.trunk-hadoop-common.patch

I've copied it here:

        http://gregroelofs.com/test/testCompressThenConcat.txt.gz

This was hand-built, but I've verified that zlib > 1.2.1.2 reads it
correctly--that is, using the regular zlib inflateInit2() API, not
the gz* one, which ignores the CRC but otherwise also handles it OK.
(Versions prior to 1.2.1.2 forgot to compute the CRC on the trailing
NULLs in the filename and comment fields.)  I don't recall if I've
verified it yet with Sun's JDK--I've made myself a note to do so
sometime this week.  (They're not exactly swift on gzip-related fixes
in any case. ;-)  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4691425)

> Are there
> examples on the net of gzip files that gzip 1.4 won't decompress, due
> to this problem?

Not to my knowledge; we've got a bit of chicken-and-egg problem there,
insofar as most people avoid generating gzip'd files that can't be decoded
with standard gzip.  Neither the JDK nor zlib minigzip provides a mechanism
to generate arbitrary header fields, AFAIK.  Possibly something like 7-Zip
does, but I suspect not.

> If not, can you please generate some?  As things
> stand, I feel that I haven't tested it in any real-world way.  Thanks.

I'll try to do so, yes.  We're putting together a more extensive test plan
for the Hadoop patch, and the ideal suite would include all possible header
combos (with/without extra field, filename, comment, CRC).  I'm not sure
I'll have time--we're approaching an internal code freeze shortly--but I'll
do what I can.

> Here's the patch.  I'll add a NEWS entry shortly.

Thank you!  I'll also test this at work this week--it will make my own
testing easier.

Greg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]