[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v5 5/5] qcow2: Avoid memory over-allocation on c
From: |
Max Reitz |
Subject: |
Re: [Qemu-devel] [PATCH v5 5/5] qcow2: Avoid memory over-allocation on compressed images |
Date: |
Wed, 25 Apr 2018 17:00:53 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 |
On 2018-04-24 00:33, Eric Blake wrote:
> When reading a compressed image, we were allocating s->cluster_data
> to 32*cluster_size + 512 (possibly over 64 megabytes, for an image
> with 2M clusters). Let's check out the history:
>
> Back when qcow2 was first written, we used s->cluster_data for
> everything, including copy_sectors() and encryption, where we want
> to operate on more than one cluster at once. Obviously, at that
> point, the buffer had to be aligned for other users, even though
> compression itself doesn't require any alignment (the fact that
> the compressed data generally starts mid-sector means that aligning
> our buffer buys us nothing - either the protocol already supports
> byte-based access into whatever offset we want, or we are already
> using a bounce buffer to read a full sector, and copying into
> our destination no longer requires alignment).
>
> But commit 1b9f1491 (v1.1!) changed things to allocate parallel
> buffers on demand rather than sharing a single buffer, for encryption
> and COW, leaving compression as the final client of s->cluster_data.
> That use was still preserved, because if a single compressed cluster
> is read more than once, we reuse the cache instead of decompressing
> it a second time (someday, we may come up with better caching to
> avoid wasting repeated decompressions while still being more parallel,
> but that is a task for another patch; the XXX comment in
> qcow2_co_preadv for QCOW2_CLUSTER_COMPRESSED is telling).
>
> Much later, in commit de82815d (v2.2), we noticed that a 64M
> allocation is prone to failure, so we switched over to a graceful
> memory allocation error message. Elsewhere in the code, we do
> g_malloc(2 * cluster_size) without ever checking for failure, but
> even 4M starts to be large enough that trying to be nice is worth
> the effort, so we want to keep that aspect.
>
> Then even later, in 3e4c7052 (2.11), we realized that allocating
> a large buffer up front for every qcow2 image is expensive, and
> switched to lazy allocation only for images that actually had
> compressed clusters. But in the process, we never even bothered
> to check whether what we were allocating still made sense in its
> new context!
>
> So, it's time to cut back on the waste. A compressed cluster
> written by qemu will NEVER occupy more than an uncompressed
> cluster, but based on mid-sector alignment, we may still need
> to read 1 cluster + 1 sector in order to recover enough bytes
> for the decompression. But third-party producers of qcow2 may
> not be as smart, and gzip DOES document that because the
> compression stream adds metadata, and because of the pigeonhole
> principle, there are worst case scenarios where attempts to
> compress will actually inflate an image, by up to 0.015% (or 62
> sectors larger for an unfortunate 2M compression).
Hm? 2M * 0.00015 < 315 (bytes!), so where does that number come from?
> In fact,
> the qcow2 spec permits an all-ones sector count, plus a full
> sector containing the initial offset, for a maximum read of
> 2 full clusters; and thanks to the way decompression works,
> even though such a value is probably too large for the actual
> compressed data, it really doesn't matter if we read too much
> (gzip ignores slop, once it has decoded a full cluster). So
> it's easier to just allocate cluster_data to be as large as we
> can ever possibly see; even if it still wastes up to 2M on any
> image created by qcow2, that's still an improvment of 60M less
> waste than pre-patch.
OK, so from the technical perspective it's irrelevant anyway, but I
suppose the number should still be fixed in the commit message.
Apart from that, the series looks good to me.
Max
> Signed-off-by: Eric Blake <address@hidden>
> Reviewed-by: Alberto Garcia <address@hidden>
>
> ---
> v4: fix off-by-one in assertion and commit message [Berto]
> v3: tighten allocation [Berto]
> v2: actually check allocation failure (previous version meant
> to use g_malloc, but ended up posted with g_try_malloc without
> checking); add assertions outside of conditional, improve
> commit message to better match reality now that qcow2 spec bug
> has been fixed
> ---
> block/qcow2-cluster.c | 27 ++++++++++++++++++---------
> block/qcow2.c | 2 +-
> 2 files changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 1dc00ff2110..6e3eb88a37a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1599,20 +1599,29 @@ int qcow2_decompress_cluster(BlockDriverState *bs,
> uint64_t cluster_offset)
> sector_offset = coffset & 511;
> csize = nb_csectors * 512 - sector_offset;
>
> - /* Allocate buffers on first decompress operation, most images are
> - * uncompressed and the memory overhead can be avoided. The buffers
> - * are freed in .bdrv_close().
> + /* Allocate buffers on the first decompress operation; most
> + * images are uncompressed and the memory overhead can be
> + * avoided. The buffers are freed in .bdrv_close(). qemu
> + * never writes an inflated cluster, and gzip itself never
> + * inflates a problematic cluster by more than 0.015%, but the
> + * qcow2 format allows up to 1 byte short of 2 full clusters
> + * when including the sector containing offset. gzip ignores
> + * trailing slop, so just allocate that much up front rather
> + * than reject third-party images with overlarge csize.
> */
> + assert(!!s->cluster_data == !!s->cluster_cache);
> + assert(csize <= 2 * s->cluster_size);
> if (!s->cluster_data) {
> - /* one more sector for decompressed data alignment */
> - s->cluster_data = qemu_try_blockalign(bs->file->bs,
> - QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size + 512);
> + s->cluster_data = g_try_malloc(2 * s->cluster_size);
> if (!s->cluster_data) {
> return -ENOMEM;
> }
> - }
> - if (!s->cluster_cache) {
> - s->cluster_cache = g_malloc(s->cluster_size);
> + s->cluster_cache = g_try_malloc(s->cluster_size);
> + if (!s->cluster_cache) {
> + g_free(s->cluster_data);
> + s->cluster_data = NULL;
> + return -ENOMEM;
> + }
> }
>
> BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
> diff --git a/block/qcow2.c b/block/qcow2.c
> index ef68772acad..a8301371ccc 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2151,7 +2151,7 @@ static void qcow2_close(BlockDriverState *bs)
> g_free(s->image_backing_format);
>
> g_free(s->cluster_cache);
> - qemu_vfree(s->cluster_data);
> + g_free(s->cluster_data);
> qcow2_refcount_close(bs);
> qcow2_free_snapshots(bs);
> }
>
signature.asc
Description: OpenPGP digital signature
- [Qemu-devel] [PATCH v5 0/5] minor qcow2 compression improvements, Eric Blake, 2018/04/23
- [Qemu-devel] [PATCH v5 2/5] qcow2: Document some maximum size constraints, Eric Blake, 2018/04/23
- [Qemu-devel] [PATCH v5 4/5] qcow2: Don't allow overflow during cluster allocation, Eric Blake, 2018/04/23
- [Qemu-devel] [PATCH v5 1/5] qcow2: Prefer byte-based calls into bs->file, Eric Blake, 2018/04/23
- [Qemu-devel] [PATCH v5 5/5] qcow2: Avoid memory over-allocation on compressed images, Eric Blake, 2018/04/23
- Re: [Qemu-devel] [PATCH v5 5/5] qcow2: Avoid memory over-allocation on compressed images,
Max Reitz <=
- [Qemu-devel] [PATCH v5 3/5] qcow2: Reduce REFT_OFFSET_MASK, Eric Blake, 2018/04/23