[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH] qcow2: avoid lseek on block_status if possible
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-block] [PATCH] qcow2: avoid lseek on block_status if possible |
Date: |
Mon, 25 Mar 2019 15:41:57 +0100 |
User-agent: |
Mutt/1.11.3 (2019-02-01) |
Am 22.03.2019 um 17:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 25.01.2019 17:21, Vladimir Sementsov-Ogievskiy wrote:
> > drv_co_block_status digs bs->file for additional, more accurate search
> > for hole inside region, reported as DATA by bs since 5daa74a6ebc.
> >
> > This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
> > knows, where are holes and where is data. But every block_status
> > request calls lseek additionally. Assume a big disk, full of
> > data, in any iterative copying block job (or img convert) we'll call
> > lseek(HOLE) on every iteration, and each of these lseeks will have to
> > iterate through all metadata up to the end of file. It's obviously
> > ineffective behavior. And for many scenarios we don't need this lseek
> > at all.
> >
> > However, lseek is needed when we have metadata-preallocated image.
> >
> > So, let's detect metadata-preallocation case and don't dig qcow2's
> > protocol file in other cases.
> >
> > The idea is to compare allocation size in POV of filesystem with
> > allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
> > significantly lower, consider it as metadata-preallocation case.
> >
> > Suggested-by: Denis V. Lunev <address@hidden>
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
> > ---
> >
> > Hi!
> >
> > So, to continue talk about lseek/no lseek when qcow2 block_status reports
> > DATA.
> >
> > Results on tmpfs:
> > cached is lseek cache by Kevin
> > detect is this patch
> > no lseek is just remove block_status query on bs->file->bs in
> > bdrv_co_block_status
> >
> > +---------------------+--------+--------+--------+----------+
> > | | master | cached | detect | no lseek |
> > +---------------------+--------+--------+--------+----------+
> > | test.qcow2 | 80 | 40 | 0.169 | 0.162 |
> > +---------------------+--------+--------+--------+----------+
> > | test_forward.qcow2 | 79 | 0.171 | 0.169 | 0.163 |
> > +---------------------+--------+--------+--------+----------+
> > | test_prealloc.qcow2 | 0.054 | 0.053 | 0.055 | 0.263 |
> > +---------------------+--------+--------+--------+----------+
> >
> > block/qcow2.h | 1 +
> > include/block/block_int.h | 7 +++++++
> > block/io.c | 3 ++-
> > block/qcow2-refcount.c | 36 ++++++++++++++++++++++++++++++++++++
> > block/qcow2.c | 7 +++++++
> > 5 files changed, 53 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/qcow2.h b/block/qcow2.h
> > index 438a1dee9e..d7113ed44c 100644
> > --- a/block/qcow2.h
> > +++ b/block/qcow2.h
> > @@ -610,6 +610,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs,
> > int refcount_order,
> > void *cb_opaque, Error **errp);
> > int qcow2_shrink_reftable(BlockDriverState *bs);
> > int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
> > +int qcow2_detect_metadata_preallocation(BlockDriverState *bs);
> >
> > /* qcow2-cluster.c functions */
> > int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
> > diff --git a/include/block/block_int.h b/include/block/block_int.h
> > index f605622216..c895ca7169 100644
> > --- a/include/block/block_int.h
> > +++ b/include/block/block_int.h
> > @@ -59,6 +59,12 @@
> >
> > #define BLOCK_PROBE_BUF_SIZE 512
> >
> > +typedef enum BdrvYesNoUnknown {
> > + BDRV_UNKNOWN = 0,
> > + BDRV_NO,
> > + BDRV_YES,
> > +} BdrvYesNoUnknown;
> > +
> > enum BdrvTrackedRequestType {
> > BDRV_TRACKED_READ,
> > BDRV_TRACKED_WRITE,
> > @@ -682,6 +688,7 @@ struct BlockDriverState {
> > bool probed; /* if true, format was probed rather than specified */
> > bool force_share; /* if true, always allow all shared permissions */
> > bool implicit; /* if true, this filter node was automatically
> > inserted */
> > + BdrvYesNoUnknown metadata_preallocation;
> >
> > BlockDriver *drv; /* NULL means no media */
> > void *opaque;
I'm not sure if adding a new field to BlockDriverState is justified for
this. It's already a huge struct.
Wouldn't returning a new flag like BDRV_BLOCK_RECURSE from
.bdrv_co_block_status be nicer?
> > diff --git a/block/io.c b/block/io.c
> > index bd9d688f8b..815661750a 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2186,7 +2186,8 @@ static int coroutine_fn
> > bdrv_co_block_status(BlockDriverState *bs,
> > }
> > }
> >
> > - if (want_zero && local_file && local_file != bs &&
> > + if (want_zero && bs->metadata_preallocation != BDRV_NO &&
> > + local_file && local_file != bs &&
> > (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
> > (ret & BDRV_BLOCK_OFFSET_VALID)) {
> > int64_t file_pnum;
> > diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> > index 1c63ac244a..008196d849 100644
> > --- a/block/qcow2-refcount.c
> > +++ b/block/qcow2-refcount.c
> > @@ -3379,3 +3379,39 @@ int64_t qcow2_get_last_cluster(BlockDriverState *bs,
> > int64_t size)
> > "There are no references in the refcount
> > table.");
> > return -EIO;
> > }
> > +
> > +int qcow2_detect_metadata_preallocation(BlockDriverState *bs)
> > +{
> > + BDRVQcow2State *s = bs->opaque;
> > + int64_t i, end_cluster, cluster_count = 0;
> > + int64_t file_length, real_allocation, metadata_allocation, file_tail;
> > + uint64_t refcount;
> > +
> > + file_length = bdrv_getlength(bs->file->bs);
> > + if (file_length < 0) {
> > + return file_length;
> > + }
> > + file_tail = offset_into_cluster(s, file_length);
> > +
> > + real_allocation = bdrv_get_allocated_file_size(bs->file->bs);
> > + if (real_allocation < 0) {
> > + return real_allocation;
> > + }
> > +
> > + end_cluster = size_to_clusters(s, file_length);
> > + for (i = 0; i < end_cluster; i++) {
> > + int ret = qcow2_get_refcount(bs, i, &refcount);
> > + if (ret < 0) {
> > + return ret;
> > + }
> > + cluster_count += !!refcount;
> > + }
As an optimisation, we could stop counting as soon as cluster_count gets
higher than real_allocation / 0.9 (which can be calculated once before
the loop).
> > + metadata_allocation = cluster_count * s->cluster_size;
> > + if (!!refcount && file_tail) {
> > + metadata_allocation -= s->cluster_size - file_tail;
> > + }
> > +
> > + return real_allocation < 0.9 * metadata_allocation &&
> > + real_allocation + s->cluster_size < metadata_allocation;
> > +}
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 4897abae5e..adc9cdcb27 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -1800,6 +1800,13 @@ static int coroutine_fn
> > qcow2_co_block_status(BlockDriverState *bs,
> > unsigned int bytes;
> > int status = 0;
> >
> > + if (bs->metadata_preallocation == BDRV_UNKNOWN) {
>
> Without locks the following leads to image corruption. Assume that refcounts
> metadata
> read without lock may pollute refcounts cache. So:
>
> qemu_co_mutex_lock(&s->lock);
>
> > + ret = qcow2_detect_metadata_preallocation(bs);
>
> qemu_co_mutex_unlock(&s->lock);
Well, just take the lock earlier (before this if block) instead
> > + if (ret >= 0) {
> > + bs->metadata_preallocation = ret ? BDRV_YES : BDRV_NO;
> > + }
> > + }
At least I would make sure that for ret < 0, the function isn't called
again and again because it probably means that the protocol layer simply
doesn't support bdrv_get_allocated_file_size().
> > +
> > bytes = MIN(INT_MAX, count);
> > qemu_co_mutex_lock(&s->lock);
> > ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
Kevin