qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocati


From: Kevin Wolf
Subject: Re: [PATCH v6 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation
Date: Thu, 23 Apr 2020 18:04:10 +0200

Am 23.04.2020 um 17:36 hat Eric Blake geschrieben:
> On 4/23/20 10:01 AM, Kevin Wolf wrote:
> > The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
> > image is possibly preallocated and then the zero flag is added to all
> > clusters. This means that a copy-on-write operation may be needed when
> > writing to these clusters, despite having used preallocation, negating
> > one of the major benefits of preallocation.
> > 
> > Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
> > and if the protocol driver can ensure that the new area reads as zeros,
> > we can skip setting the zero flag in the qcow2 layer.
> 
> Hmm.  When we get block status, it is very easy to tell that something reads
> as zero when the qcow2 file has the zero bit set, but when the qcow2 file
> does not have the zero bit set, we have to then query the format layer
> whether it reads as zeros (which is expensive enough for some format layers
> that we no longer report things as reading as zero). I'm worried that
> optimizing this case could penalize later block status.

That's just how preallocation works. If you don't want that, you need
preallocation=off.

> We already can tell the difference between a cluster that has the zero bit
> set but is not preallocated, vs. has the zero bit set and is preallocated.
> Are we really forcing a copy-on-write to a cluster that is marked zero but
> preallocated?  Is the problem that we don't have a way to distinguish
> between 'reads as zeroes, allocated, but we don't know state of format
> layer' and 'reads as zeroes, allocated, and we know format layer reads as
> zeroes'?

Basically, yes. If we had this, we could have a type of cluster where
writing to it still involves a metadata update (to clear the zero flag),
but never copy-on-write, even for partial writes.

I'm not sure if this would cover a very relevant case, though.

> > 
> > Unfortunately, the same approach doesn't work for metadata
> > preallocation, so we'll still set the zero flag there.
> > 
> > Signed-off-by: Kevin Wolf <address@hidden>
> > ---
> >   block/qcow2.c              | 22 +++++++++++++++++++---
> >   tests/qemu-iotests/274.out |  4 ++--
> >   2 files changed, 21 insertions(+), 5 deletions(-)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index ad621fe404..b28e588942 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -4170,9 +4170,25 @@ static int coroutine_fn 
> > qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >           /* Allocate the data area */
> >           new_file_size = allocation_start +
> >                           nb_new_data_clusters * s->cluster_size;
> > -        /* Image file grows, so @exact does not matter */
> > -        ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
> > -                               errp);
> > +        /*
> > +         * Image file grows, so @exact does not matter.
> > +         *
> > +         * If we need to zero out the new area, try first whether the 
> > protocol
> > +         * driver can already take care of this.
> > +         */
> > +        if (flags & BDRV_REQ_ZERO_WRITE) {
> > +            ret = bdrv_co_truncate(bs->file, new_file_size, false, 
> > prealloc,
> > +                                   BDRV_REQ_ZERO_WRITE, errp);
> > +            if (ret >= 0) {
> > +                flags &= ~BDRV_REQ_ZERO_WRITE;
> > +            }
> 
> Hmm - just noticing things: how does this series interplay with the existing
> bdrv_has_zero_init_truncate?  Should this series automatically handle
> BDRV_REQ_ZERO_WRITE on a bdrv_co_truncate(PREALLOC_NONE) request for all
> drivers that report true, even if that driver does not advertise support for
> the BDRV_REQ_ZERO_WRITE flag?

Hmm... It feels risky to me.

> > +        } else {
> > +            ret = -1;
> > +        }
> 
> Here, ret == -1 does not imply whether errp is set - but annoyingly, errp
> CAN be set if bdrv_co_truncate() failed.
> 
> > +        if (ret < 0) {
> > +            ret = bdrv_co_truncate(bs->file, new_file_size, false, 
> > prealloc, 0,
> > +                                   errp);
> 
> And here, you are passing a possibly-set errp back to bdrv_co_truncate().
> That is a bug that can abort.  You need to pass NULL to the first
> bdrv_co_truncate() call or else clear errp prior to trying a fallback to
> this second bdrv_co_truncate() call.

Yes, you're right. If nothing else comes up, I'll fix this while
applying.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]