qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 28/32] qcow2: Add subcluster support to qcow2_co_pwrite_ze


From: Eric Blake
Subject: Re: [PATCH v7 28/32] qcow2: Add subcluster support to qcow2_co_pwrite_zeroes()
Date: Thu, 28 May 2020 14:11:07 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

On 5/28/20 10:04 AM, Alberto Garcia wrote:
On Wed 27 May 2020 07:58:10 PM CEST, Eric Blake wrote:
There is just one thing to take into account for a possible future
improvement: compressed clusters cannot be partially zeroized so
zero_l2_subclusters() on the head or the tail can return -ENOTSUP.
This makes the caller repeat the *complete* request and write actual
zeroes to disk. This is sub-optimal because

     1) if the head area was compressed we would still be able to use
        the fast path for the body and possibly the tail.

     2) if the tail area was compressed we are writing zeroes to the
        head and the body areas, which are already zeroized.

Is this true?  The block layer tries hard to break zero requests up so
that any non-cluster-aligned requests do not cross cluster boundaries.
In practice, that means that when you have an unaligned request, the
head and tail cluster will be the same cluster, and there is no body in
play, so that returning -ENOTSUP is correct because there really is no
other work to do and repeating the entire request (which is less than a
cluster in length) is the right approach.

Let's use an example.

cluster size is 64KB, subcluster size is 2KB, and we get this request:

    write -z 31k 130k

Since pwrite_zeroes_alignment equals the cluster size (64KB), this
would result in 3 calls to qcow2_co_pwrite_zeroes():

    offset=31k  size=33k    [-ENOTSUP, writes actual zeros]
    offset=64k  size=64k    [zeroized using the relevant metadata bits]
    offset=128k size=33k    [-ENOTSUP, writes actual zeros]

However this patch changes the alignment:

     bs->bl.pwrite_zeroes_alignment = s->subcluster_size;

Ah, I missed that trick.  But it is nice, and indeed...


so we get these instead:

    offset=31k  size=1k     [-ENOTSUP, writes actual zeros]
    offset=32k  size=128k   [zeroized using the relevant metadata bits]
    offset=160k size=1k     [-ENOTSUP, writes actual zeros]

So far, so good. Reducing the alignment requirements allows us to
maximize the number of subclusters to zeroize.

...we can now hit a request that is not cluster-aligned.


Now let's suppose we have this request:

    write -z 32k 128k

This one is aligned so it goes directly to qcow2_co_pwrite_zeroes().
However if the third cluster is compressed then the function will
return -ENOTSUP after having zeroized the first 96KB of the request,
forcing the caller to repeat it completely using the slow path.

I think the problem also exists in the current code (without my
patches). If you zeroize 10 clusters and the last one is compressed
you have to repeat the request after having zeroized 9 clusters.

Hmm. In the pre-patch code, qcow2_co_pwrite_zeroes() calls qcow2_cluster_zeroize() which can fail with -ENOTSUP up front, but not after the fact. Once it starts the while loop over clusters, its use of zero_in_l2_slice() handles compressed clusters just fine; as far as I can tell, only your new subcluster handling lets it now fail with -ENOTSUP after earlier clusters have been visited.

But isn't this something we could solve recursively? Instead of returning -ENOTSUP, we could have zero_in_l2_slice() call bdrv_pwrite_zeroes() on the (sub-)clusters associated with a compressed cluster.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]