|
From: | Eric Blake |
Subject: | Re: [Qemu-block] [PATCH v7 6/6] qcow2: Avoid memory over-allocation on compressed images |
Date: | Tue, 13 Nov 2018 16:38:53 -0600 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 |
On 6/29/18 10:47 AM, Kevin Wolf wrote:
Am 29.06.2018 um 17:16 hat Eric Blake geschrieben:On 06/29/2018 04:03 AM, Kevin Wolf wrote:Am 28.06.2018 um 21:07 hat Eric Blake geschrieben:When reading a compressed image, we were allocating s->cluster_data to 32*cluster_size + 512 (possibly over 64 megabytes, for an image with 2M clusters). Let's check out the history:However, the qcow2 spec permits an all-ones sector count, plus a full sector containing the initial offset, for a maximum read of 2 full clusters. Thanks to the way decompression works, even though such a value is too large for the actual compressed data (for all but 512-byte clusters), it really doesn't matter if we read too much (gzip ignores slop, once it has decoded a full cluster). So it's easier to just allocate cluster_data to be as large as we can ever possibly see; even if it still wastes up to 2M on any image created by qcow2, that's still an improvment ofs/improvment/improvement/60M less waste than pre-patch.
I wonder how much of a difference s->cluster_cache really makes. I wouldn't expect guests to access the same cluster twice in a row.I don't know if it makes a difference. But my patch is not even touching that - ALL I'm doing is changing a 64M allocation into a 4M allocation, with otherwise no change to the frequency of allocation (which is once per image).Maybe the easiest solution would be switching to temporary buffers that would have the exact size we need and would be freed at the end of the request?The exact size for a qemu-produced image would be at most 2M instead of 4M - but doing the change you propose WILL cause more frequent allocations (once per cluster, rather than once per image). We'd have to benchmark if it makes sense.I wouldn't expect that another allocation makes a big difference when you already have to decompress the whole cluster. In fact, it could speed things up because we could then parallelise this. Hmm... Wasn't there a series for using a worker thread for decompression recently? It might actually already make that change, I don't remember.But that would be a separate patch from this one.Yes, just a thought I had while reviewing your patch.
Well, such a patch has now landed in your block-next queue, so I'm going to rebase this patch on top of that (if there's still anything left to rebase, that is), and submit the remaining parts of this series that still make sense for 3.1 as a v8 posting.
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Prev in Thread] | Current Thread | [Next in Thread] |