qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH 2/2] qcow2: Implement bdrv_truncate() for growin


From: Kevin Wolf
Subject: [Qemu-devel] Re: [PATCH 2/2] qcow2: Implement bdrv_truncate() for growing images
Date: Mon, 26 Apr 2010 13:46:57 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Thunderbird/3.0.4

Am 24.04.2010 10:12, schrieb Stefan Hajnoczi:
> This patch adds the ability to grow qcow2 images in-place using
> bdrv_truncate().  This enables qemu-img resize command support for
> qcow2.
> 
> Snapshots are not supported and bdrv_truncate() will return -ENOTSUP.
> The notion of resizing an image with snapshots could lead to confusion:
> users may expect snapshots to remain unchanged, but this is not possible
> with the current qcow2 on-disk format where the header.size field is
> global instead of per-snapshot.  Others may expect snapshots to change
> size along with the current image data.  I think it is safest to not
> support snapshots and perhaps add behavior later if there is a
> consensus.

I think it would make most sense if we kept the disk size per snapshot
(so your first option). The snapshot header is extensible, so it should
be possible.

Just disabling it is okay with me, too, but in that case this patch is
much more complicated than it needs to be: If there are no snapshots,
there is no vmstate area.

> Backing images continue to work.  If the image is now larger than its
> backing image, zeroes are read when accessing beyond the end of the
> backing image.
> 
> Signed-off-by: Stefan Hajnoczi <address@hidden>
> ---
> This applies to kevin/block.
> 
>  block/qcow2-cluster.c  |   64 ++++++++++++++++++++++++++++++++++++++---------
>  block/qcow2-snapshot.c |    2 +-
>  block/qcow2.c          |   43 +++++++++++++++++++++++++++++---
>  block/qcow2.h          |    9 ++++++-
>  4 files changed, 99 insertions(+), 19 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index c11680d..20c8426 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -28,30 +28,39 @@
>  #include "block_int.h"
>  #include "block/qcow2.h"
>  
> -int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
> +/*
> + * qcow2_grow_l1_table_common
> + *
> + * Grows the L1 table and updates the header on disk.
> + *
> + * Setting new_l1_vm_state_index to s->l1_vm_state_index grows the vm state
> + * area.
> + *
> + * Setting new_l1_vm_state_index to the new end of image grows the image data
> + * area.

I don't understand this distinction. I'm sure you describe something
trivial here, but it sounds like magic.

There's really not much difference between data and vmstate clusters and
callers don't know on which of both they are working.

> + *
> + * Returns 0 on success, -errno in failure case.
> + */
> +static int qcow2_grow_l1_table_common(BlockDriverState *bs,
> +                                      int new_l1_vm_state_index,
> +                                      int new_l1_size)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    int new_l1_size, new_l1_size2, ret, i;
> +    int new_l1_size2, ret, i;
>      uint64_t *new_l1_table;
>      int64_t new_l1_table_offset;
>      uint8_t data[12];
>  
> -    new_l1_size = s->l1_size;
> -    if (min_size <= new_l1_size)
> -        return 0;
> -    if (new_l1_size == 0) {
> -        new_l1_size = 1;
> -    }
> -    while (min_size > new_l1_size) {
> -        new_l1_size = (new_l1_size * 3 + 1) / 2;
> -    }
>  #ifdef DEBUG_ALLOC2
>      printf("grow l1_table from %d to %d\n", s->l1_size, new_l1_size);
>  #endif
>  
>      new_l1_size2 = sizeof(uint64_t) * new_l1_size;
>      new_l1_table = qemu_mallocz(align_offset(new_l1_size2, 512));
> -    memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
> +    memcpy(new_l1_table, s->l1_table, s->l1_vm_state_index * 
> sizeof(uint64_t));

This change smells like a buffer overflow. It both can read beyond the
end of the old L1 table and can overflow the new one if the L1 table
hasn't yet reached its full size.

I think this is the main problem of this patch: You seem to assume that
the L1 table is created with the full size needed to cover all of the
data area. Which seems to be true at least for images created by
qemu-img, but I don't think we can rely on it (nor do I think we should,
because it introduces a special case where there is none).

I rather consider this preallocation a performance optimization. If we
stop treating it as such, we would need to refuse opening any images
that don't fulfill this requirement.

> +    memcpy(&new_l1_table[new_l1_vm_state_index],
> +           &s->l1_table[s->l1_vm_state_index],
> +           (s->l1_size - s->l1_vm_state_index) * sizeof(uint64_t));
>  
>      /* write new table (align to cluster) */
>      BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
> @@ -83,6 +92,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
>      s->l1_table_offset = new_l1_table_offset;
>      s->l1_table = new_l1_table;
>      s->l1_size = new_l1_size;
> +    s->l1_vm_state_index = new_l1_vm_state_index;
>      return 0;
>   fail:
>      qemu_free(new_l1_table);
> @@ -90,6 +100,34 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int 
> min_size)
>      return ret < 0 ? ret : -EIO;
>  }
>  
> +int qcow2_grow_l1_vm_state(BlockDriverState *bs, int min_size)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int new_l1_size;
> +
> +    new_l1_size = s->l1_size;
> +    if (min_size <= new_l1_size)
> +        return 0;
> +    if (new_l1_size == 0) {
> +        new_l1_size = 1;
> +    }
> +    while (min_size > new_l1_size) {
> +        new_l1_size = (new_l1_size * 3 + 1) / 2;
> +    }
> +
> +    return qcow2_grow_l1_table_common(bs, s->l1_vm_state_index, new_l1_size);
> +}
> +
> +int qcow2_grow_l1_image_data(BlockDriverState *bs, int new_l1_size)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int new_l1_vm_state_index;
> +
> +    new_l1_vm_state_index = new_l1_size;
> +    new_l1_size += s->l1_size - s->l1_vm_state_index;

Same assumption as above. This might turn into a negative number added
to new_l1_size.

> +    return qcow2_grow_l1_table_common(bs, new_l1_vm_state_index, 
> new_l1_size);
> +}
> +
>  void qcow2_l2_cache_reset(BlockDriverState *bs)
>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -524,7 +562,7 @@ static int get_cluster_table(BlockDriverState *bs, 
> uint64_t offset,
>  
>      l1_index = offset >> (s->l2_bits + s->cluster_bits);
>      if (l1_index >= s->l1_size) {qcow2_grow_l1_vm_state(
> -        ret = qcow2_grow_l1_table(bs, l1_index + 1);
> +        ret = qcow2_grow_l1_vm_state(bs, l1_index + 1);

Once you agree that get_cluster_table can't know if it's working on data
or vmstate, qcow2_grow_l1_vm_state is a misnomer at least.

>          if (ret < 0) {
>              return ret;
>          }
> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
> index 2a21c17..7f0d810 100644
> --- a/block/qcow2-snapshot.c
> +++ b/block/qcow2-snapshot.c
> @@ -326,7 +326,7 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
> *snapshot_id)
>      if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 
> -1) < 0)
>          goto fail;
>  
> -    if (qcow2_grow_l1_table(bs, sn->l1_size) < 0)
> +    if (qcow2_grow_l1_vm_state(bs, sn->l1_size) < 0)
>          goto fail;
>  
>      s->l1_size = sn->l1_size;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 4a7ab66..ab622a2 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -140,7 +140,7 @@ static int qcow_read_extensions(BlockDriverState *bs, 
> uint64_t start_offset,
>  static int qcow_open(BlockDriverState *bs, int flags)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    int len, i, shift;
> +    int len, i;
>      QCowHeader header;
>      uint64_t ext_end;
>  
> @@ -188,8 +188,7 @@ static int qcow_open(BlockDriverState *bs, int flags)
>  
>      /* read the level 1 table */
>      s->l1_size = header.l1_size;
> -    shift = s->cluster_bits + s->l2_bits;
> -    s->l1_vm_state_index = (header.size + (1LL << shift) - 1) >> shift;
> +    s->l1_vm_state_index = size_to_l1(s, header.size);
>      /* the L1 table must contain at least enough entries to put
>         header.size bytes */
>      if (s->l1_size < s->l1_vm_state_index)
> @@ -1095,6 +1094,40 @@ static int qcow_make_empty(BlockDriverState *bs)
>      return 0;
>  }
>  
> +static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int ret, new_l1_size;
> +
> +    if (offset & 511) {
> +        return -EINVAL;
> +    }
> +
> +    /* cannot proceed if image has snapshots */
> +    if (s->nb_snapshots) {
> +        return -ENOTSUP;
> +    }
> +
> +    /* shrinking is currently not supported */
> +    if (offset < bs->total_sectors << BDRV_SECTOR_BITS) {
> +        return -ENOTSUP;
> +    }
> +
> +    new_l1_size = size_to_l1(s, offset);
> +    ret = qcow2_grow_l1_image_data(bs, new_l1_size);
> +    if (ret < 0) {
> +        return ret;
> +    }

Instead of the confusing changes above you could just increase the L1
table size using the old function and keep the data/vmstate thing local
to qcow2_truncate (or maybe better a qcow2_move_vmstate(bs, offset)
which internally grows the L1 table as needed)

Actually, I think this is not that different from your patch (you called
the almost same function qcow2_grow_l1_image_data and avoided the normal
calculation of the next L1 table size for some reason), but probably a
lot less confusing.

> +
> +    /* write updated header.size */
> +    offset = cpu_to_be64(offset);
> +    if (bdrv_pwrite(bs->file, offsetof(QCowHeader, size), &offset,
> +                    sizeof(uint64_t)) != sizeof(uint64_t)) {
> +        return -EIO;
> +    }

The vmstate_offset field needs to be updated somewhere, too. In my
suggestion this would be in qcow2_move_vmstate.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]