[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

From: Avi Kivity
Subject: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
Date: Mon, 11 Oct 2010 17:47:37 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100921 Fedora/3.1.4-1.fc13 Lightning/1.0b3pre Thunderbird/3.1.4

 On 10/11/2010 05:41 PM, Anthony Liguori wrote:
On 10/11/2010 10:24 AM, Avi Kivity wrote:
 On 10/11/2010 05:02 PM, Anthony Liguori wrote:
On 10/11/2010 08:44 AM, Avi Kivity wrote:
 On 10/11/2010 03:42 PM, Stefan Hajnoczi wrote:
> A leak is acceptable (it won't grow; it's just an unused, incorrect
>  freelist), but data corruption is not.

The alternative is for the freelist to be a non-compat feature bit.
That means older QEMU binaries cannot use a QED image that has enabled
the freelist.

For this one feature.  What about others?

A compat feature is one where the feature can be completely ignored (meaning that the QEMU does not have to understand the data format).

An example of a compat feature is copy-on-read. It's merely a suggestion and there is no additional metadata. If a QEMU doesn't understand it, it doesn't affect it's ability to read the image.

An example of a non-compat feature would be zero cluster entries. Zero cluster entries are a special L2 table entry that indicates that a cluster's on-disk data is all zeros. As long as there is at least 1 ZCE in the L2 tables, this feature bit must be set. As soon as all of the ZCE bits are cleared, the feature bit can be unset.

An older QEMU will gracefully fail when presented with an image using ZCE bits. An image with no ZCEs will work on older QEMUs.

What's the motivation behind ZCE?

It's very useful for Copy-on-Read. If the cluster in the backing file is unallocated, then when you do a copy-on-read, you don't want to write out a zero cluster since you'd expand the image to it's maximum size.

It's also useful for operations like compaction in the absence of TRIM. The common implementation on platforms like VMware is to open a file and write zeros to it until it fills up the filesystem. You then delete the file. The result is that any unallocated data on the disk is written as zero and combined with zero-detection in the image format, you can compact the image size by marking unallocated blocks as ZCE.

Both make sense. The latter is also useful with TRIM: if you have a backing image it's better to implement TRIM with ZCE rather than exposing the cluster from the backing file; it saves you a COW when you later reallocate the cluster.

There is yet a third type of feature, one which is not strictly needed in order to use the image, but if used, must be kept synchronized. An example is the freelist. Another example is a directory index for a filesystem. I can't think of another example which would be relevant to QED -- metadata checksums perhaps? -- we can always declare it a non-compatible feature, but of course, it reduces compatibility.

You're suggesting a feature that is not strictly needed, but that needs to be kept up to date. If it can't be kept up to date, something needs to happen to remove it. Let's call this a transient feature.

Most of the transient features can be removed given some bit of code. For instance, ZCE can be removed by writing out zero clusters or writing an unallocated L2 entry if there is no backing file.

I think we could add a qemu-img demote command or something like that that attempted to remove features when possible. That doesn't give you instant compatibility but I'm doubtful that you can come up with a generic way to remove a feature from an image without knowing anything about the image.

That should work, and in the worst case there is qemu-img convert (which should be taught about format options).

error compiling committee.c: too many arguments to function

reply via email to

[Prev in Thread] Current Thread [Next in Thread]