[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocati
From: |
Kevin Wolf |
Subject: |
Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation |
Date: |
Thu, 23 Apr 2020 18:04:10 +0200 |
Am 23.04.2020 um 17:36 hat Eric Blake geschrieben:
> On 4/23/20 10:01 AM, Kevin Wolf wrote:
> > The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
> > image is possibly preallocated and then the zero flag is added to all
> > clusters. This means that a copy-on-write operation may be needed when
> > writing to these clusters, despite having used preallocation, negating
> > one of the major benefits of preallocation.
> >
> > Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
> > and if the protocol driver can ensure that the new area reads as zeros,
> > we can skip setting the zero flag in the qcow2 layer.
>
> Hmm. When we get block status, it is very easy to tell that something reads
> as zero when the qcow2 file has the zero bit set, but when the qcow2 file
> does not have the zero bit set, we have to then query the format layer
> whether it reads as zeros (which is expensive enough for some format layers
> that we no longer report things as reading as zero). I'm worried that
> optimizing this case could penalize later block status.
That's just how preallocation works. If you don't want that, you need
preallocation=off.
> We already can tell the difference between a cluster that has the zero bit
> set but is not preallocated, vs. has the zero bit set and is preallocated.
> Are we really forcing a copy-on-write to a cluster that is marked zero but
> preallocated? Is the problem that we don't have a way to distinguish
> between 'reads as zeroes, allocated, but we don't know state of format
> layer' and 'reads as zeroes, allocated, and we know format layer reads as
> zeroes'?
Basically, yes. If we had this, we could have a type of cluster where
writing to it still involves a metadata update (to clear the zero flag),
but never copy-on-write, even for partial writes.
I'm not sure if this would cover a very relevant case, though.
> >
> > Unfortunately, the same approach doesn't work for metadata
> > preallocation, so we'll still set the zero flag there.
> >
> > Signed-off-by: Kevin Wolf <address@hidden>
> > ---
> > block/qcow2.c | 22 +++++++++++++++++++---
> > tests/qemu-iotests/274.out | 4 ++--
> > 2 files changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index ad621fe404..b28e588942 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -4170,9 +4170,25 @@ static int coroutine_fn
> > qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> > /* Allocate the data area */
> > new_file_size = allocation_start +
> > nb_new_data_clusters * s->cluster_size;
> > - /* Image file grows, so @exact does not matter */
> > - ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
> > - errp);
> > + /*
> > + * Image file grows, so @exact does not matter.
> > + *
> > + * If we need to zero out the new area, try first whether the
> > protocol
> > + * driver can already take care of this.
> > + */
> > + if (flags & BDRV_REQ_ZERO_WRITE) {
> > + ret = bdrv_co_truncate(bs->file, new_file_size, false,
> > prealloc,
> > + BDRV_REQ_ZERO_WRITE, errp);
> > + if (ret >= 0) {
> > + flags &= ~BDRV_REQ_ZERO_WRITE;
> > + }
>
> Hmm - just noticing things: how does this series interplay with the existing
> bdrv_has_zero_init_truncate? Should this series automatically handle
> BDRV_REQ_ZERO_WRITE on a bdrv_co_truncate(PREALLOC_NONE) request for all
> drivers that report true, even if that driver does not advertise support for
> the BDRV_REQ_ZERO_WRITE flag?
Hmm... It feels risky to me.
> > + } else {
> > + ret = -1;
> > + }
>
> Here, ret == -1 does not imply whether errp is set - but annoyingly, errp
> CAN be set if bdrv_co_truncate() failed.
>
> > + if (ret < 0) {
> > + ret = bdrv_co_truncate(bs->file, new_file_size, false,
> > prealloc, 0,
> > + errp);
>
> And here, you are passing a possibly-set errp back to bdrv_co_truncate().
> That is a bug that can abort. You need to pass NULL to the first
> bdrv_co_truncate() call or else clear errp prior to trying a fallback to
> this second bdrv_co_truncate() call.
Yes, you're right. If nothing else comes up, I'll fix this while
applying.
Kevin
- [PATCH v6 03/10] block-backend: Add flags to blk_truncate(), (continued)
- [PATCH v6 03/10] block-backend: Add flags to blk_truncate(), Kevin Wolf, 2020/04/23
- [PATCH v6 06/10] file-posix: Support BDRV_REQ_ZERO_WRITE for truncate, Kevin Wolf, 2020/04/23
- [PATCH v6 07/10] block: truncate: Don't make backing file data visible, Kevin Wolf, 2020/04/23
- [PATCH v6 08/10] iotests: Filter testfiles out in filter_img_info(), Kevin Wolf, 2020/04/23
- [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation, Kevin Wolf, 2020/04/23
- Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation, Max Reitz, 2020/04/23
- [PATCH v6 09/10] iotests: Test committing to short backing file, Kevin Wolf, 2020/04/23