Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support

From: Jeff Cody
Subject: Re: [Qemu-devel] CoW image commit+shrink(= make_empty) support
Date: Thu, 07 Jun 2012 10:14:49 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

On 06/07/2012 02:19 AM, Taisuke Yamada wrote:
> I attended Paolo Bonzini's qemu session ("Live Disk Operations: Juggling
> Data and Trying to go Unnoticed") in LinuxCon Japan, and he adviced me
> to post the bits I have regarding my question on qemu's  support on shrinking
> CoW image.
> Here's my problem description.
> I recently designed a experimental system which holds VM master images
> on a HDD and CoW snapshots on a SSD. VMs run on CoW snapshots only.
> This split-image configration is done to keep VM I/Os on a SSD
> As SSD capacity is rather limited, I need to do a writeback commit from SSD to
> HDD time to time, and that is done during weekend/midnight. The problem is
> although a commit is made, that alone won't shrink CoW image - all unused 
> blocks
> are still kept in a snapshot, and uses up space.
> Patch attached is a workaround I added to cope with the problem,
> but the basic problem I faced was that both QCOW2/QED format still does not
> support "bdrv_make_empty" API.
> Implementing the API (say, by hole punching) seemed like a lot of effort, so
> I ended up creating a new CoW image, and then replace current CoW
> snapshot with a new (empty) one. But I find the code ugly.
> In his talk, Paolo suggested possibility of using new "live op" API for this
> task, but I'm not aware of the actual API. Is there any documentation or
> source code I can look at to re-implement above feature?
> Best Regards,

Hello Taisuke-san,

I am working on a document now for a live commit proposal, with the API
being similar to the block-stream command, but for a live commit.  Here
is what I am thinking about proposing for the command:

{ 'command': 'block-commit', 'data': { 'device': 'str', '*base': 'str',
                                       '*top': 'str', '*speed': 'int' } }

I think something similar to the above would be good for a 'live
commit', and it would be somewhat analogous to block streaming, but in
the other direction.

One issue I see with the patch attached, is the reliance on bdrv_close()
and a subsequent bdrv_open() - once you perform a bdrv_close(), you no
longer have the ability to safely recover from error, because it is
possible for the recovery bdrv_open() to fail for some reason.

The live block commit command I am working on operates like the block
streaming code, and like transactional commands in that the use of
bdrv_close() / bdrv_open() to change an image is avoided, so that error
recovery can be safely done by just abandoning the operation.  A key
point that needs to be done 'transactionally', is to open the base or
intermediate target image with file access mode r/w, as the backing
files are open as r/o by default.

I am going to be putting all my documentation into the qemu wiki today /
tomorrow, and I will follow up with a link to that if you like.


