[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Generic image streaming

From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC] Generic image streaming
Date: Mon, 26 Sep 2011 15:21:01 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Sep 26, 2011 at 09:35:01AM -0300, Marcelo Tosatti wrote:
> On Fri, Sep 23, 2011 at 04:57:26PM +0100, Stefan Hajnoczi wrote:
> > Here is my generic image streaming branch, which aims to provide a way
> > to copy the contents of a backing file into an image file of a running
> > guest without requiring specific support in the various block drivers
> > (e.g.  qcow2, qed, vmdk):
> > 
> > http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api
> > 
> > The tree does not provide full image streaming yet but I'd like to
> > discuss the approach taken in the code.  Here are the main points:
> > 
> > The image streaming API is available through HMP and QMP commands.  When
> > streaming is started on a block device a coroutine is created to do the
> > background I/O work.  The coroutine can be cancelled.
> > 
> > While the coroutine copies data from the backing file into the image
> > file, the guest may be performing I/O to the image file.  Guest reads do
> > not conflict with streaming but guest writes require special handling.
> > If the guest writes to a region of the image file that we are currently
> > copying, then there is the potential to clobber the guest write with old
> > data from the backing file.
> > 
> > Previously I solved this in a QED-specific way by taking advantage of
> > the serialization of allocating write requests.  In order to do this
> > generically we need to track in-flight requests and have the ability to
> > queue I/O.  Guest writes that affect an in-flight streaming copy
> > operation must wait for that operation to complete before being issued.
> > Streaming copy operations must skip overlapping regions of guest writes.
> > 
> > One big difference to the QED image streaming implementation is that
> > this generic implementation is not based on copy-on-read operations.
> > Instead we do a sequence of bdrv_is_allocated() to find regions for
> > streaming, followed by bdrv_co_read() and bdrv_co_write() in order to
> > populate the image file.
> > 
> > It turns out that generic copy-on-read is not an attractive operation
> > because it requires using bounce buffers for every request. 
> Isnt COR essential for a decent read performance on the
> image-stream-from-slow-remote-origin case?

It is essential for re-read performance from a slow backing file.  With
images over internet HTTP it most definitely is worth doing

In the case of an NFS server the performance depends on the network and
server.  It might be similar speed or faster to read from NFS.

I will think some more about how to implement generic copy-on-read.

> > Kevin pointed out the case where a guest performs a read and pokes the
> > data buffer before the read completes, copy-on-read would write out
> > the modified memory into the image file unless we use a bounce buffer.
> Either wait for the write originating from a COR to finish before
> exposing the read to the guest, or have a bounce buffer.

When the guest issues a write we try read directly into its read data
buffer.  The problem is that it is not okay to write out that buffer
from guest many because it may have been scribbled on by the guest.

The guest can do this even before we notify it of read completion.  So
waiting for the write to complete does not solve the problem,

Although sane guests will not scribble over data buffers we cannot allow
QEMU to turn a memory corruption inside the guest into a data corruption
of the disk image.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]