qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] qcow2: Optimize the refcount-block overlap chec


From: Max Reitz
Subject: Re: [Qemu-block] [PATCH] qcow2: Optimize the refcount-block overlap check
Date: Wed, 1 Feb 2017 02:46:20 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0

On 30.01.2017 17:14, Alberto Garcia wrote:
> The metadata overlap checks introduced in a40f1c2add help detect
> corruption in the qcow2 image by verifying that data writes don't
> overlap with existing metadata sections.
> 
> The 'refcount-block' check in particular iterates over the refcount
> table in order to get the addresses of all refcount blocks and check
> that none of them overlap with the region where we want to write.
> 
> The problem with the refcount table is that since it always occupies
> complete clusters its size is usually very big.

Actually the main problem is that BDRVQcow2State.refcount_table_size is
updated very generously as opposed to BDRVQcow2State.l1_size. The latter
is usually exactly as large as it needs to be (because the L1 table size
usually doesn't change), whereas the refcount_table_size is just bumped
up every time the image gets bigger until it reaches or exceeds the
value it needs to be.

>                                                 With the default
> values of cluster_size=64KB and refcount_bits=16 this table holds 8192
> entries, each one of them enough to map 2GB worth of host clusters.
> 
> So unless we're using images with several TB of allocated data this
> table is going to be mostly empty, and iterating over it is a waste of
> CPU. If the storage backend is fast enough this can have an effect on
> I/O performance.
> 
> This patch keeps the index of the last used (i.e. non-zero) entry in
> the refcount table and updates it every time the table changes. The
> refcount-block overlap check then uses that index instead of reading
> the whole table.
> 
> In my tests with a 4GB qcow2 file stored in RAM this doubles the
> amount of write IOPS.
> 
> Signed-off-by: Alberto Garcia <address@hidden>
> ---
>  block/qcow2-refcount.c | 21 ++++++++++++++++++++-
>  block/qcow2.c          |  1 +
>  block/qcow2.h          |  1 +
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index cbfb3fe064..5e4d36587f 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -83,6 +83,16 @@ static Qcow2SetRefcountFunc *const set_refcount_funcs[] = {
>  /*********************************************************/
>  /* refcount handling */
>  
> +static void update_max_refcount_table_index(BDRVQcow2State *s)
> +{
> +    unsigned i = s->refcount_table_size - 1;
> +    while (i > 0 && (s->refcount_table[i] & REFT_OFFSET_MASK) == 0) {
> +        i--;
> +    }
> +    /* Set s->max_refcount_table_index to the index of the last used entry */
> +    s->max_refcount_table_index = i;
> +}
> +
>  int qcow2_refcount_init(BlockDriverState *bs)
>  {
>      BDRVQcow2State *s = bs->opaque;
> @@ -111,6 +121,7 @@ int qcow2_refcount_init(BlockDriverState *bs)
>          }
>          for(i = 0; i < s->refcount_table_size; i++)
>              be64_to_cpus(&s->refcount_table[i]);
> +        update_max_refcount_table_index(s);
>      }
>      return 0;
>   fail:
> @@ -439,6 +450,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
>          }
>  
>          s->refcount_table[refcount_table_index] = new_block;
> +        s->max_refcount_table_index = refcount_table_index;

Should be MAX(s->max_refcount_table_index, refcount_table_index) or this
won't support refcount tables with holes in them.

Apart from this, the patch looks good to me. (Just a nagging comment below.)

>  
>          /* The new refcount block may be where the caller intended to put its
>           * data, so let it restart the search. */
> @@ -580,6 +592,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
>      s->refcount_table = new_table;
>      s->refcount_table_size = table_size;
>      s->refcount_table_offset = table_offset;
> +    update_max_refcount_table_index(s);
>  
>      /* Free old table. */
>      qcow2_free_clusters(bs, old_table_offset, old_table_size * 
> sizeof(uint64_t),
> @@ -2171,6 +2184,7 @@ write_refblocks:
>      s->refcount_table = on_disk_reftable;
>      s->refcount_table_offset = reftable_offset;
>      s->refcount_table_size = reftable_size;
> +    update_max_refcount_table_index(s);
>  
>      return 0;
>  
> @@ -2383,7 +2397,11 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
> int ign, int64_t offset,
>      }
>  
>      if ((chk & QCOW2_OL_REFCOUNT_BLOCK) && s->refcount_table) {
> -        for (i = 0; i < s->refcount_table_size; i++) {
> +        unsigned last_entry = s->max_refcount_table_index;
> +        assert(last_entry < s->refcount_table_size);
> +        assert(last_entry + 1 == s->refcount_table_size ||
> +               (s->refcount_table[last_entry + 1] & REFT_OFFSET_MASK) == 0);

I'm not sure we need this second assertion, but I don't mind it too much
either.

I'd actually find an assertion that last_entry is less than UNSIGNED_MAX
more important because otherwise the loop below would be an endless one. :-)

(Yes, it's pretty obvious that it's less than UNSIGNED_MAX because it's
less than s->refcount_table_size, which is an unsigned int. I'm just
trying to say something while I'm not sure exactly what I'm trying to
say. Sorry.)

Max

> +        for (i = 0; i <= last_entry; i++) {
>              if ((s->refcount_table[i] & REFT_OFFSET_MASK) &&
>                  overlaps_with(s->refcount_table[i] & REFT_OFFSET_MASK,
>                  s->cluster_size)) {
> @@ -2871,6 +2889,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, 
> int refcount_order,
>      /* Now update the rest of the in-memory information */
>      old_reftable = s->refcount_table;
>      s->refcount_table = new_reftable;
> +    update_max_refcount_table_index(s);
>  
>      s->refcount_bits = 1 << refcount_order;
>      s->refcount_max = UINT64_C(1) << (s->refcount_bits - 1);

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]