Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3

From:	Stefan Hajnoczi
Subject:	Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3
Date:	Wed, 12 Oct 2011 13:51:54 +0100

On Tue, Jun 28, 2011 at 10:38 AM, Frediano Ziglio <address@hidden> wrote:
> 2011/6/27 Kevin Wolf <address@hidden>:
>> This is the second draft for what I think could be added when we increase 
>> qcow2's
>> version number to 3. This includes points that have been made by several 
>> people
>> over the past few months. We're probably not going to implement this next 
>> week,
>> but I think it's important to get discussions started early, so here it is.
>>
>> Changes implemented in this RFC:
>>
>> - Added compatible/incompatible/auto-clear feature bits plus an optional
>>  feature name table to allow useful error messages even if an older version
>>  doesn't know some feature at all.
>>
>> - Added a dirty flag which tells that the refcount may not be accurate ("QED
>>  mode"). This means that we can save writes to the refcount table with
>>  cache=writethrough, but isn't really useful otherwise since Qcow2Cache.
>>
>> - Configurable refcount width. If you don't want to use internal snapshots,
>>  make refcounts one bit and save cache space and I/O.
>>
>> - Added subclusters. This separate the COW size (one subcluster, I'm thinking
>>  of 64k default size here) from the allocation size (one cluster, 2M). Less
>>  fragmentation, less metadata, but still reasonable COW granularity.
>>
>>  This also allows to preallocate clusters, but none of their subclusters. You
>>  can have an image that is like raw + COW metadata, and you can also
>>  preallocate metadata for images with backing files.
>>
>> - Zero cluster flags. This allows discard even with a backing file that 
>> doesn't
>>  contain zeros. It is also useful for copy-on-read/image streaming, as you'll
>>  want to keep sparseness without accessing the remote image for an 
>> unallocated
>>  cluster all the time.
>>
>> - Fixed internal snapshot metadata to use 64 bit VM state size. You can't 
>> save
>>  a snapshot of a VM with >= 4 GB RAM today.
>>
>> Possible future additions:
>>
>> - Add per-L2-table dirty flag to L1?
>> - Add per-refcount-block full flag to refcount table?
>
> Hi,
>  thinking about image improvement I would add
>
> - GUID for image and backing file
> - relative path for backing file
>
> This would help finding images in a distributed environment or if file
> are moved, ie: gfs/nfs/ocfs mounted in different mount points, backing
> used a template in a different images directory and move this
> directory somewhere else. Also with GUID a possible higher level could
> manage a GUID <-> file image db.
>
> I was also think about a "backing file length" field to support
> resizing but probably can be implemented with zero cluster. Assume you
> have a image of 5gb, create a new image with first image as backing
> one, now resize second image from 5gb to 3gb then resize it again
> (after some works) to 10gb, part from 3gb to 5gb should not be read
> from backing file.

Interesting idea.  One could argue either way.  When image file size
!= backing file size you need to know what you are doing :).  I think
the case where the image is smaller than the backing file is rare and
zeroing vs exposing the backing file on resize isn't an obvious
choice.

> Also a bit in l2 offset to say "there is no l2 table" cause all
> clusters in l2 are contiguous so we avoid entirely l2. Obviously this
> require an optimization step to detect or create such condition.

There are several reserved L1 entry bits which could be used to mark
this mode.  This mode severely restricts qcow2 features though: how
would snapshots and COW work?  Perhaps by breaking the huge cluster
back into an L2 table with individual clusters?  Backing files also
cannot be used - unless we extend the sub-clusters approach and also
keep a large bitmap with allocated/unallocated/zero information.

A mode like this could be used for best performance on local storage,
where efficiently image transport (e.g. scp or http) is not required.
Actually I think this is reasonable, we could use qemu-img convert to
produce a compact qcow2 for export and use the L2-less qcow2 for
running the actual VM.

Kevin: what do you think about fleshing out this mode instead of sub-clusters?

> This mail sound quite strange to me, I thought qed would be the future
> of qcow2 but I must be really wrong.

What it's called doesn't matter but we need better metadata, and by
making qcow2v3 extensible we can now improvements without losing
support for existing image files.

Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3, Stefan Hajnoczi <=
- Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3, Kevin Wolf, 2011/10/12
  - Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3, Stefan Hajnoczi, 2011/10/12
    - Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3, Kevin Wolf, 2011/10/12
    - Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3, Stefan Hajnoczi, 2011/10/13

Prev by Date: Re: [Qemu-devel] [PATCH 3/6] block: switch bdrv_read()/bdrv_write() to coroutines
Next by Date: Re: [Qemu-devel] [PATCH 4/6] block: switch bdrv_aio_readv() to coroutines
Previous by thread: [Qemu-devel] [PATCH] usb-hid: activate usb tablet / mouse after migration.
Next by thread: Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3
Index(es):
- Date
- Thread