qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate


From: Kevin Wolf
Subject: Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate
Date: Fri, 24 Apr 2020 14:17:12 +0200

Am 24.04.2020 um 08:16 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 23.04.2020 18:01, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf <address@hidden>
> > Reviewed-by: Max Reitz <address@hidden>
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c         | 33 +++++++++++++++++++++++++++++++++
> >   2 files changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index 17f1363279..4b5fc8c4a7 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, 
> > uint64_t offset,
> >       /* Caller must pass aligned values, except at image end */
> >       assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> >       assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> > -           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> > +           end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
> >       /* The zero flag is only supported by version 3 and newer */
> >       if (s->qcow_version < 3) {
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 9cfbdfc939..ad621fe404 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -1726,6 +1726,7 @@ static int coroutine_fn 
> > qcow2_do_open(BlockDriverState *bs, QDict *options,
> >       bs->supported_zero_flags = header.version >= 3 ?
> >                                  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK 
> > : 0;
> > +    bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> >       /* Repair image if dirty */
> >       if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
> > @@ -4214,6 +4215,38 @@ static int coroutine_fn 
> > qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> >           g_assert_not_reached();
> >       }
> > +    if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > +        uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > +
> > +        /*
> > +         * Use zero clusters as much as we can. qcow2_cluster_zeroize()
> > +         * requires a cluster-aligned start. The end may be unaligned if 
> > it is
> > +         * at the end of the image (which it is here).
> > +         */
> > +        ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 
> > 0);
> > +        if (ret < 0) {
> > +            error_setg_errno(errp, -ret, "Failed to zero out new 
> > clusters");
> > +            goto fail;
> > +        }
> > +
> > +        /* Write explicit zeros for the unaligned head */
> > +        if (zero_start > old_length) {
> > +            uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
> 
> Why not s/s->cluster_size/zero_start - old_length/? We may save a lot
> of pages if cluster_size is large.

Ok.

> > +            QEMUIOVector qiov;
> > +            qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
> > +
> > +            qemu_co_mutex_unlock(&s->lock);
> 
> We are in intermediate state here. Is it safe to unlock? Anything may
> happen, up to another truncate..

I don't think it's worse than unlocking during normal writes, which we
have been doing for a long time. I don't see anything that would corrupt
any internal state.

Of course, doing truncate in parallel with other operations around EOF
has somewhat undefined semantics because the order could be anything.
But if you don't want to get undefined results, you just shouldn't make
such requests. It's similar to reading and writing the same area at the
same time.

> > +            ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 
> > 0, 0);
> 
> Better not do it if this cluster is already ZERO.. But it may be a
> later patch to improve that.

I doubt it's worth writing code to optimise a corner case inside a
corner case.

> > +            qemu_co_mutex_lock(&s->lock);
> > +
> > +            qemu_vfree(buf);
> > +            if (ret < 0) {
> > +                error_setg_errno(errp, -ret, "Failed to zero out the new 
> > area");
> > +                goto fail;
> > +            }
> > +        }
> > +    }
> > +
> >       if (prealloc != PREALLOC_MODE_OFF) {
> >           /* Flush metadata before actually changing the image size */
> >           ret = qcow2_write_caches(bs);

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]