[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 2/2] qcow2: Avoid memory over-allocation on comp

From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images
Date: Wed, 21 Feb 2018 17:51:16 +0100
User-agent: Mutt/1.9.1 (2017-09-22)

Am 20.02.2018 um 23:24 hat Eric Blake geschrieben:
> When reading a compressed image, we were allocating s->cluster_data
> to 32*cluster_size + 512 (possibly over 64 megabytes, for an image
> with 2M clusters).  Let's check out the history:
> Back when qcow2 was first written, we used s->cluster_data for
> everything, including copy_sectors() and encryption, where we want
> to operate on more than one cluster at once.  Obviously, at that
> point, the buffer had to be aligned for other users, even though
> compression itself doesn't require any alignment.
> But commit 1b9f1491 (v1.1!) changed things to allocate parallel
> buffers on demand rather than sharing a single buffer, for encryption
> and COW, leaving compression as the final client of s->cluster_data.
> That use was still preserved, because if a single compressed cluster
> is read more than once, we reuse the cache instead of decompressing
> it a second time (I'm not sure how often this optimization actually
> fires, or if it penalizes us from being able to decompress multiple
> clusters in parallel even though we can now decrypt clusters in
> parallel; the XXX comment in qcow2_co_preadv for
> Much later, in commit de82815d (v2.2), we noticed that a 64M
> allocation is prone to failure, so we switched over to a graceful
> memory allocation error message.  But note that elsewhere in the
> code, we do g_malloc(2 * cluster_size) without ever checking for
> failure.
> Then even later, in 3e4c7052 (2.11), we realized that allocating
> a large buffer up front for every qcow2 image is expensive, and
> switched to lazy allocation only for images that actually had
> compressed clusters.  But in the process, we never even bothered
> to check whether what we were allocating still made sense in its
> new context!
> So, it's time to cut back on the waste.  A compressed cluster
> will NEVER occupy more than an uncompressed cluster (okay, gzip
> DOES document that because the compression stream adds metadata,
> and because of the pigeonhole principle, there are worst case
> scenarios where attempts to compress will actually inflate an
> image - but in those cases, we would just write the cluster
> uncompressed instead of inflating it).  And as that is a smaller
> amount of memory, we can get by with the simpler g_malloc.
> Signed-off-by: Eric Blake <address@hidden>

My comments feel a bit trivial compared to Berto's, but anyway:

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 85be7d5e340..8c4b26ceaf2 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1603,15 +1603,9 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
> uint64_t cluster_offset)
>           * are freed in .bdrv_close().
>           */
>          if (!s->cluster_data) {
> -            /* one more sector for decompressed data alignment */
> -            s->cluster_data = qemu_try_blockalign(bs->file->bs,
> -                    QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
> -            if (!s->cluster_data) {
> -                return -ENOMEM;
> -            }
> -        }
> -        if (!s->cluster_cache) {
> -            s->cluster_cache = g_malloc(s->cluster_size);
> +            assert(!s->cluster_cache);

Wouldn't it be better to assert (!!s->cluster_cache ==
!!s->cluster_data) unconditionally?

> +            s->cluster_data = g_try_malloc(s->cluster_size);

Why are you going from qemu_try_blockalign() to simple malloc here? This
buffer is used with bdrv_read() (or bdrv_pread() after patch 1), so we
should avoid unnecessary use of a bounce buffer.

> +            s->cluster_cache = g_try_malloc(s->cluster_size);

As you already said, either g_malloc() or check the result. I actually
think that g_try_malloc() and checking the result is nicer, we still
allocate up to 2 MB here.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]