[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16472: PD: gzip - tree overflow protection algorithm doubt/suggestio
From: |
Krzysztof Rybak |
Subject: |
bug#16472: PD: gzip - tree overflow protection algorithm doubt/suggestion |
Date: |
Thu, 16 Jan 2014 22:55:25 +0100 |
Hi,
I sent a message to the author of gzip but without response. Maybe You can help
me with my doubts.
best regards,
Krzysztof
Dnia Czwartek, 2 Stycznia 2014 22:41 Krzysztof Rybak <address@hidden>
napisaĆ(a)
> Hello Jean-loup,
> I'm preparing my own algorithm based on Huffman compression method and
> deflate format.
>
> I have doubts if method implemented in gzip program is 100% correct.
> I mean tree overflow protection:
> just to remind: in deflate (RFC1951) and therefore in gzip there is creating
> tree for lit/len, dist (maximum length of codes is 15 bits) and code table
> (maximum length is 7 bits). In case of too long code word, there is tree
> modification implemented in gen_bitlen(desc) function (trees.c):
>
> do {
> bits = max_length-1;
> while (bl_count[bits] == 0) bits--;
> bl_count[bits]--; /* move one leaf down the tree */
> bl_count[bits+1] += 2; /* move one overflow item as its brother */
> bl_count[max_length]--;
> overflow -= 2;
> } while (overflow > 0);
>
> I'm not sure if this method is correct for original code length greater than
> 8 ( for maximum 7 ) and than 16 ( for maximum 15): even not every time. I
> couldn't obtain such symbol distribution to generate so long codes on real
> files indeed as such situation is extremely rare but generated it by
> modifying software:
>
> just before mentioned above overflow protection I changed bl_count histogram
> to one as for Fibonacci numbers (which is the worst for Huffman overflow -
> the longest codes, please refer to attached image): in my case I modified
> tree when maximum length is 7 bits (max_length = 7 is for code book
> compression):
> bl_count[0] = 1;
> bl_count[1] = 1;
> bl_count[2] = 1;
> bl_count[3] = 1;
> bl_count[4] = 1;
> bl_count[5] = 1;
> bl_count[6] = 1;
> bl_count[7] = 1;
> bl_count[8] = 1;
> bl_count[9] = 1;
> bl_count[10] = 2;
> overflow = 4;
>
> in fact in gzip there is setting length = 7 when it's longer than 7:
> if (bits > max_length){
> bits = max_length;
> overflow++;
> }
>
> ,so my histogram looks like:
> bl_count[0] = 1;
> bl_count[1] = 1;
> bl_count[2] = 1;
> bl_count[3] = 1;
> bl_count[4] = 1;
> bl_count[5] = 1;
> bl_count[6] = 1;
> bl_count[7] = 5;
> bl_count[8] = 0;
> bl_count[9] = 0;
> overflow = 4;
> and after overflow protection method from gzip:
> bl_count[0] = 1
> bl_count[1] = 1
> bl_count[2] = 1
> bl_count[3] = 1
> bl_count[4] = 1
> bl_count[5] = 0
> bl_count[6] = 2
> bl_count[7] = 5
> , which is incorrect Huffman tree: too many elements.
>
> trees.c with my modification and printf() function is attached. I compress
> obj2 from Calgary corpus.
> I know my description may look unclear, but I hope You will help me/clarify
> my doubts.
>
> best regards,
> Krzysztof Rybak
Fibonacci.png
Description: PNG image
trees.c
Description: Text Data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#16472: PD: gzip - tree overflow protection algorithm doubt/suggestion,
Krzysztof Rybak <=