[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero cluste

From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
Date: Wed, 12 Apr 2017 08:32:11 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 04/12/2017 04:49 AM, Kevin Wolf wrote:
> Am 11.04.2017 um 03:17 hat Eric Blake geschrieben:
>> 'qemu-img map' already coalesces information about unallocated
>> clusters (those with status 0) and pure zero clusters (those
>> with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
>> qcow2 images with no backing file already report all unallocated
>> clusters (in the preallocation sense of clusters with no offset)
>> as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
>> set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
>> BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
>> to external users), thanks to generic block layer code in
>> bdrv_co_get_block_status().
>> So, for an image with no backing file, having bdrv_pwrite_zeroes
>> mark clusters as unallocated (defer to backing file) rather than
>> reads-as-zero (regardless of backing file) makes no difference
>> to normal behavior, but may potentially allow for fewer writes to
>> the L2 table of a freshly-created image where the L2 table is
>> initially written to all-zeroes (although I don't actually know
>> if we skip an L2 update and flush when re-writing the same
>> contents as already present).
> I don't get this. Allocating a cluster always involves an L2 update, no
> matter whether it was previously unallocated or a zero cluster.

But we're NOT allocating a cluster.  Either we are unmapping a cluster,
or are we are changing a pure unallocated cluster (bdrv_get_block_status
of 0) to a read-as-zero but unallocated cluster (status 1) [different
from the 'bdrv_is_allocated' sense of an allocated cluster being any
cluster determined by the current layer instead of by its backing file -
I really wish we wouldn't overload 'allocated' to two different
meanings, but am not sure which of the two meanings would be better to
rename, and to what].  My argument is that for a brand new image with no
backing file, all the clusters are pure unallocated, and that a write
zero with unmapping is doing more work by writing 0 => 1 than it would
by leaving 0 => 0, with identical results.

>> Furthermore, this matches the behavior of discard_single_l2(), in
>> favoring an unallocated cluster over a zero cluster when full
>> discard is requested.
> The only use for "full discard" is qcow2_make_empty(). It explicitly
> requests that the backing file becomes visible again. This is a
> completely different case.
> In other words, in order to stay consistent between discard and
> write_zeroes from a guest POV, we need to leave this code alone.

I already argued that from a guest POV, the code is identical.  The only
situation that I changed is the situation where there is no backing
file, so the guest is guaranteed to read zeros from the cluster whether
the cluster is marked pure unallocated (0) or as reads-as-zero
unallocated (1).  The guest already requested unmapping, so they don't
need the preallocated-reads-as-zero (1 | offset).

Note that I absolutely agree that we must NOT make this change when
there is a backing file - a guest request to write zeroes MUST use
reads-as-zero (1) or preallocated reads-as-zero (1 | offset), and not
pure unallocated, unless we can guarantee that falling back to the
backing file will also read as zeroes, but taking the time to learn the
backing file status is not worth the expense.

>> Meanwhile, version 2 qcow2 files (compat=0.10) lack support for an
>> explicit zero cluster.  This minor tweak therefore allows us to turn
>> write zeroes with unmap into an actual unallocation on those files,
>> where they used to return -ENOTSUP and cause an allocation due to
>> the fallback to explicitly written zeroes.
> Okay, this is true.
> But I doubt that making write_zeroes more efficient on v2 images without
> a backing file is really worth any extra complexity at this point...

Is it just my commit message that is wrong?  The code itself is fairly
short to review (I spent a lot more time on the commit message than on
the code - and it looks like I still need more time on the commit
message).  And later patches in the series DO depend on this, so I'm not
ready to just drop the patch yet.

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]