[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 6/6] qcow2: Avoid memory over-allocation on c

From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH v7 6/6] qcow2: Avoid memory over-allocation on compressed images
Date: Fri, 29 Jun 2018 10:16:51 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 06/29/2018 04:03 AM, Kevin Wolf wrote:
Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:
When reading a compressed image, we were allocating s->cluster_data
to 32*cluster_size + 512 (possibly over 64 megabytes, for an image
with 2M clusters).  Let's check out the history:

However, the qcow2 spec permits an all-ones sector count, plus
a full sector containing the initial offset, for a maximum read
of 2 full clusters.  Thanks to the way decompression works, even
though such a value is too large for the actual compressed data
(for all but 512-byte clusters), it really doesn't matter if we
read too much (gzip ignores slop, once it has decoded a full
cluster).  So it's easier to just allocate cluster_data to be as
large as we can ever possibly see; even if it still wastes up to
2M on any image created by qcow2, that's still an improvment of


60M less waste than pre-patch.

Signed-off-by: Eric Blake <address@hidden>
Reviewed-by: Alberto Garcia <address@hidden>


+++ b/block/qcow2-cluster.c
@@ -1599,20 +1599,29 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
uint64_t cluster_offset)
          sector_offset = coffset & 511;
          csize = nb_csectors * 512 - sector_offset;

-        /* Allocate buffers on first decompress operation, most images are
-         * uncompressed and the memory overhead can be avoided.  The buffers
-         * are freed in .bdrv_close().
+        /* Allocate buffers on the first decompress operation; most
+         * images are uncompressed and the memory overhead can be
+         * avoided.  The buffers are freed in .bdrv_close().  qemu
+         * never writes an inflated cluster, and gzip itself never
+         * inflates a problematic cluster by more than 0.015%, but the
+         * qcow2 format allows up to 1 byte short of 2 full clusters
+         * when including the sector containing offset.  gzip ignores
+         * trailing slop, so just allocate that much up front rather
+         * than reject third-party images with overlarge csize.
+        assert(!!s->cluster_data == !!s->cluster_cache);
+        assert(csize <= 2 * s->cluster_size);
          if (!s->cluster_data) {
-            /* one more sector for decompressed data alignment */
-            s->cluster_data = qemu_try_blockalign(bs->file->bs,
-                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
+            s->cluster_data = g_try_malloc(2 * s->cluster_size);
              if (!s->cluster_data) {
                  return -ENOMEM;
-        }
-        if (!s->cluster_cache) {
-            s->cluster_cache = g_malloc(s->cluster_size);
+            s->cluster_cache = g_try_malloc(s->cluster_size);
+            if (!s->cluster_cache) {
+                g_free(s->cluster_data);
+                s->cluster_data = NULL;
+                return -ENOMEM;
+            }

I wonder how much of a difference s->cluster_cache really makes. I
wouldn't expect guests to access the same cluster twice in a row.

I don't know if it makes a difference. But my patch is not even touching that - ALL I'm doing is changing a 64M allocation into a 4M allocation, with otherwise no change to the frequency of allocation (which is once per image).

Maybe the easiest solution would be switching to temporary buffers that
would have the exact size we need and would be freed at the end of the

The exact size for a qemu-produced image would be at most 2M instead of 4M - but doing the change you propose WILL cause more frequent allocations (once per cluster, rather than once per image). We'd have to benchmark if it makes sense.

But that would be a separate patch from this one.

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]