qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 19/31] qcow2: Add subcluster support to calculate_l2_meta(


From: Eric Blake
Subject: Re: [PATCH v5 19/31] qcow2: Add subcluster support to calculate_l2_meta()
Date: Wed, 6 May 2020 12:39:48 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

On 5/6/20 12:14 PM, Alberto Garcia wrote:

Note that you are only skipping until the first normal subcluster, even
if other zero/unallocated clusters occur between the first normal
cluster and the start of the action.

That's correct.

Or visually, suppose we have:

--0-NN-0_NNNNNNNN_NNNNNNNN_NNNNNNNN

as our 32 subclusters, with sc_index of 8.  You will end up skipping
subclusters 0-3, but NOT 6 and 7.

That's correct. This function works with the assumption that the initial
COW region is located immediately before the data region, which is in
turn contiguous to the tail COW region.

...
Still, even though we spend time copying the allocated contents of
those two subclusters, we also copy the subcluster status, and the
guest still ends up reading the same data as before.

No, the subcluster status is updated and those subclusters are marked
now as allocated. That's actually why we can use the _RANGE masks that
you proposed here:

    https://lists.gnu.org/archive/html/qemu-block/2020-04/msg01155.html

In other words, we have this bitmap:

    --0-NN-0_NNNNNNNN_NNNNNNNN_NNNNNNNN

If we write to subcluster 8 (which was already allocated), the resulting
bitmap is this one:

    --0-NNNN_NNNNNNNN_NNNNNNNN_NNNNNNNN

The last block in iotest 271 deals exactly with this kind of scenarios.

Ah, that's what I was missing. So we indeed do more I/O than strictly necessary (by reading from the backing file or writing explicit zeroes rather than leaving unallocated or zero subclusters), but we are careful that the COW range of the operation puts the correct data into the head and tail (either from the backing file or explicitly zero) so that even though the status changes to N, the guest still sees the same contents.

I also agree that it is not worth trying to optimize to those later subclusters - it adds a lot of complexity for something that does not occur as frequently. It is more important to optimize to the initial sequence of unalloc/zero clusters, but once we hit data, taking the penalty of COWing a few extra subclusters isn't going to be much worse.


       /* Get the L2 entry of the last cluster */
-    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_index += nb_clusters - 1;
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset + bytes - 1);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);

[1] but here, we are skipping any intermediate clusters, and worrying
only about the state of the final cluster.  Is that always going to do
the correct thing, or will there be cases where we need to update the
L2 entries of intermediate clusters?

        Cluster 1             Cluster 2             Cluster 3
|---------------------|---------------------|---------------------|
    <---cow_start--><-------write request--------><--cow_end--->


All L2 entries from the beginning of cow_start until the end of cow_end
are always updated. That's again what the loop that I optimized using
the _RANGE masks (and that I liked above) was doing.

I guess the other thing that helps is that even if there are intermediate invalid subclusters, ideally the caller of this function is setting nb_clusters according to its own scan of how many contiguous clusters are even available for us to attempt to write into. So patch 20/31 catches the case where we don't have contiguous clusters if any of the subclusters are invalid. And in _this_ patch, it shouldn't really matter if we don't detect that we are overwriting what used to be an invalid subcluster with something that is now valid data.


The code in calculate_l2_meta() is only concerned with determining the
actual start and end points. Everything between them will be written to
and marked as allocated. It's only the subclusters outside that range
that keep the previous values (unallocated, or zero).

I think you've convinced me that we are safe on what this function does. In fact, if we rely on 20/31 checking for invalid subclusters when computing nb_clusters, we could probably assert that the start and end cluster in this function are not invalid, instead of adding the fail: label. But even if you don't do that, I can live with:

Reviewed-by: Eric Blake <address@hidden>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]