Re: [Qemu-block] [PATCH V4 08/10] block/qcow2: start using the compress

From: Eric Blake
Subject: Re: [Qemu-block] [PATCH V4 08/10] block/qcow2: start using the compress format extension
Date: Thu, 20 Jul 2017 14:19:52 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 07/20/2017 11:30 AM, Peter Lieven wrote:

>> The new code is now unconditionally initializing with -15 instead of
>> -12.  Does that matter, or does decompression work regardless of window
>> size used at creation, as long as the initialized size at decompression
>> is at least as large?  On the other hand, I guess that means if someone
>> compresses with a large window, and then I initialize the decompressor
>> with a small window, my decompression will fail?  That's why knowing the
>> minimum window size should be part of the spec, whether or not we make
>> it a tunable.
> The decompression is supposed to fail if you compress with 15 and
> decompress with 12. In fact it doesn't.

Actually, I think (this is my guess here, not actual researched fact)
that the decompression error is possible ONLY if compression produced a
symbol that actually required more than 12 bits of memory - to get that
large of a symbol, you need to compress a lot of bits.  For our default
cluster of 64k, it might very well be that you rarely, if ever,
encounter a single cluster that compresses differently under window size
15 than it did under window size 12 (other than perhaps the speed at
which compression took place), because there simply wasn't enough
content to reach the point where you needed a symbol in the compression
stream using more than 12 bits.  So in that case, compressing under 15
and decompressing under 12 doesn't hit the error.  But as you get larger
cluster sizes (2M clusters), or perhaps if you pass particularly nasty
sequences of input to compression (I'm not sure what sequences would
have the right properties), then you do indeed result in a compression
stream that starts to encounter symbols exceeding the window size.

But if my guess is right, then don't read the docs as "decompression
will fail", but rather as "decompression may fail" if you set the
decompress window smaller than the compression window.

> I would like to avoid the windowBits in the qcow2 header as it makes
> the code to read and write it more complicated. If you don't like the change
> of the windowBits we can even stick with 12. If someone wants fast compression
> he will likely not use zlib at all and use lzo.

Also, note that historically, 'compress -b N' has allowed tuning the
window size; current POSIX states that compress only has to support
windows from 9 to 14, but permits implementations to use up to 16 (and
future POSIX is considering improving the compress utility to require
support for 16 as the window size, https://posix.rhansen.org/p/bug1041).
 I don't know why gzip didn't expose a '-b N' windowsize parameter the
way compress did, but it sounds like the same thing.

> I just changed the windowBits to 15 as it increases speed and improves 
> compression.

Does windowBits 16 make any difference?

> (likely at the cost of memory during compression/decompression)
> Peter

