[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] Generic image streaming

From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC] Generic image streaming
Date: Mon, 26 Sep 2011 08:55:56 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Sep 26, 2011 at 01:32:34PM +0800, Zhi Yong Wu wrote:
> On Fri, Sep 23, 2011 at 11:57 PM, Stefan Hajnoczi
> <address@hidden> wrote:
> > Here is my generic image streaming branch, which aims to provide a way
> > to copy the contents of a backing file into an image file of a running
> > guest without requiring specific support in the various block drivers
> > (e.g.  qcow2, qed, vmdk):
> >
> > http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api
> >
> > The tree does not provide full image streaming yet but I'd like to
> > discuss the approach taken in the code.  Here are the main points:
> >
> > The image streaming API is available through HMP and QMP commands.  When
> > streaming is started on a block device a coroutine is created to do the
> > background I/O work.  The coroutine can be cancelled.
> >
> > While the coroutine copies data from the backing file into the image
> > file, the guest may be performing I/O to the image file.  Guest reads do
> > not conflict with streaming but guest writes require special handling.
> > If the guest writes to a region of the image file that we are currently
> > copying, then there is the potential to clobber the guest write with old
> > data from the backing file.
> >
> > Previously I solved this in a QED-specific way by taking advantage of
> > the serialization of allocating write requests.  In order to do this
> > generically we need to track in-flight requests and have the ability to
> > queue I/O.  Guest writes that affect an in-flight streaming copy
> > operation must wait for that operation to complete before being issued.
> > Streaming copy operations must skip overlapping regions of guest writes.
> >
> > One big difference to the QED image streaming implementation is that
> > this generic implementation is not based on copy-on-read operations.
> > Instead we do a sequence of bdrv_is_allocated() to find regions for
> > streaming, followed by bdrv_co_read() and bdrv_co_write() in order to
> > populate the image file.
> >
> > It turns out that generic copy-on-read is not an attractive operation
> > because it requires using bounce buffers for every request.  Kevin
> bounce buffers == buffer ring?

A bounce buffer is a temporary buffer that is used because the actual
data buffer is not addressable or cannot be directly accessed for some
other reason.  In this case it's because the guest should see read
semantics and not find that writes to its read data buffer result in
writes to disk.

> > pointed out the case where a guest performs a read and pokes the data
> > buffer before the read completes, copy-on-read would write out the
> > modified memory into the image file unless we use a bounce buffer.
> Can you elaborate this?

1. Guest issues a read request.
2. QEMU issues host read request as first step in copy-on-read.
3. Host read request completes...
4. Guest overwrites its data buffer before QEMU acknowledges request
5. ...QEMU issues host write request.
6. Host completes write request and QEMU acknowledges guest read

What happened is that we populated the image file with data from guest
memory that does not match what is in the backing file.  The guest
issued a read request, this should never result in writing to the image

Although legitimate guests do not do this, a buggy guest could corrupt
its disk in this way!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]