qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v20 3/4] qcow2: add zstd cluster compression


From: Denis Plotnikov
Subject: Re: [PATCH v20 3/4] qcow2: add zstd cluster compression
Date: Tue, 28 Apr 2020 10:23:51 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1



On 28.04.2020 09:16, Max Reitz wrote:
On 27.04.20 21:26, Denis Plotnikov wrote:

On 27.04.2020 15:35, Max Reitz wrote:
On 21.04.20 10:11, Denis Plotnikov wrote:
zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
    time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
                    src.img [zlib|zstd]_compressed.img
decompress cmd
    time ./qemu-img convert -O qcow2
                    [zlib|zstd]_compressed.img uncompressed.img

             compression               decompression
           zlib       zstd           zlib         zstd
------------------------------------------------------------
real     65.5       16.3 (-75 %)    1.9          1.6 (-16 %)
user     65.0       15.8            5.3          2.5
sys       3.3        0.2            2.0          2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov <address@hidden>
Reviewed-by: Vladimir Sementsov-Ogievskiy <address@hidden>
Reviewed-by: Alberto Garcia <address@hidden>
QAPI part:
Acked-by: Markus Armbruster <address@hidden>
---
   docs/interop/qcow2.txt |   1 +
   configure              |   2 +-
   qapi/block-core.json   |   3 +-
   block/qcow2-threads.c  | 157 +++++++++++++++++++++++++++++++++++++++++
   block/qcow2.c          |   7 ++
   5 files changed, 168 insertions(+), 2 deletions(-)
[...]

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..0525718704 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
   #define ZLIB_CONST
   #include <zlib.h>
   +#ifdef CONFIG_ZSTD
+#include <zstd.h>
+#include <zstd_errors.h>
+#endif
+
   #include "qcow2.h"
   #include "block/thread-pool.h"
   #include "crypto.h"
@@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void
*dest, size_t dest_size,
       return ret;
   }
   +#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *          -ENOMEM destination buffer is not enough to store
compressed data
+ *          -EIO    on any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+                                   const void *src, size_t src_size)
+{
+    ssize_t ret;
+    ZSTD_outBuffer output = { dest, dest_size, 0 };
+    ZSTD_inBuffer input = { src, src_size, 0 };
Minor style note: I think it’d be nicer to use designated initializers
here.

+    ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+    if (!cctx) {
+        return -EIO;
+    }
+    /*
+     * Use the zstd streamed interface for symmetry with decompression,
+     * where streaming is essential since we don't record the exact
+     * compressed size.
+     *
+     * In the loop, we try to compress all the data into one zstd
frame.
+     * ZSTD_compressStream2 potentially can finish a frame earlier
+     * than the full input data is consumed. That's why we are looping
+     * until all the input data is consumed.
+     */
+    while (input.pos < input.size) {
+        size_t zstd_ret;
+        /*
+         * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+         * with ZSTD_e_end until it returns 0, at which point you are
+         * free to start a new frame". We assume that "start a new
frame"
+         * means call ZSTD_compressStream2 in the very beginning or
when
+         * ZSTD_compressStream2 has returned with 0.
+         */
+        do {
+            zstd_ret = ZSTD_compressStream2(cctx, &output, &input,
ZSTD_e_end);
The spec makes it sound to me like ZSTD_e_end will always complete in a
single call if there’s enough space in the output buffer.  So the only
team we have to loop would be when there isn’t enough space anyway:

It says this about ZSTD_e_end:
flush operation is the same, and follows same rules as calling
ZSTD_compressStream2() with ZSTD_e_flush.
Those rules being:
Note that, if `output->size` is too small, a single invocation with
ZSTD_e_flush might not be enough (return code > 0).
So it seems like it will only return a value > 0 if the output buffer is
definitely too small.

The spec also notes that the return value is greater than 0 if:
0 if some data still present within internal buffer (the value is
minimal estimation of remaining size),
So it’s a minimum estimate.  That’s another point that heavily implies
to me that if the return value were less than what’s left in the buffer,
the function wouldn’t return but still try to write it out, until it
realizes that there isn’t enough space in the output buffer, and then
return a value that exceeds the remaining output buffer size.

(Because if the function just played it safe, I would expect it to
return a maximum estimate.)


OTOH, if it were actually possible for ZSTD_e_end to finish a frame
earlier than the end of the input, I think it would make more sense to
use ZSTD_e_continue until the input is done and then finish with
ZSTD_e_end, like the spec seems to propose.  That way, we’d always end
up with a single frame to make decompression simpler (and I think it
would also make more sense overall).


But anyway.  From how I understand the spec, this code simply always
ends up creating a single frame or erroring out, without looping ever.
So it isn’t exactly wrong, it just seems overly complicated.  (Again,
assuming I understand the spec correctly.  Which seems like a tough
thing to assume, because the spec is not exactly obvious to read...)

(Running some quick tests by converting some images with zstd
compression seems to confirm that whenever ZSTD_compressStream2()
returns, either zstd_ret > output.size - output.pos, or zstd_ret == 0
and input.pos == input.size.  So none of the loops ever loop.)

Max
So, what should we do?

1. Rely on the test that there's no need for the loop:
    * make one ZSTD_compressStream2() call
    * make sure it returned with zstd_ret == 0 and
      input.pos == input.size.
      if so, return with the size
    * if not, check that zstd_ret > output.size - output.pos
      if so, return with -ENOMEM
    * if none above return with -EIO

    This should cover the majority of the compressing cases
According to how I interpret the spec, “none of the above” should never
happen except for ZSTD_isError(zstd_ret), so this should cover all
compressing cases, actually.

2. Leave the loop as is, because of the documentation:
    "You *must* continue calling ZSTD_compressStream2() with ZSTD_e_end
until it returns 0,
     at which point you are free to start a new frame."
As far as I can see, the return value is always 0 or greater than the
remaining buffer space, so this will always be satisfied even without a
loop.  (We will always have one of three cases: (1) Success and all
input has been consumed, (2) ZSTD_isError(zstd_ret), so we return -EIO,
(3) zstd_ret > output.size - output.pos, so we return -ENOMEM.

I interpret the “You *must* continue until it returns 0” as “If there is
no sufficient space in the output buffer, this function will return a
value greater than 0 indicating how much space is at least still
required.  The caller is free to supply a greater output buffer for the
next call (by supplying a different ZSTD_outBuffer structure), and then
we’ll try again.”
(I.e., retrying with the same ZSTD_outBuffer will make the function
return immediately because it knows that it’s insufficient.)

Max

ok, removing the loop sounds reasonable.
My only concern is that *must* in the doc.
Could ZSTD-lib change the logic in the future relying on the fact
that they make users use ZSTD_compressStream() in a loop.
Honestly, I can't imagine the case when they would want to do that,
but still.
Without the loop we're protected even in this case. The worst thing
could happen because of that is qcow2_zstd_compress() would return
with -EIO more frequently.

So, if I understand correctly, you are ok with removing the loop.

Denis





reply via email to

[Prev in Thread] Current Thread [Next in Thread]