qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC] Generic image streaming


From: Stefan Hajnoczi
Subject: [Qemu-devel] [RFC] Generic image streaming
Date: Fri, 23 Sep 2011 16:57:26 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Here is my generic image streaming branch, which aims to provide a way
to copy the contents of a backing file into an image file of a running
guest without requiring specific support in the various block drivers
(e.g.  qcow2, qed, vmdk):

http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/image-streaming-api

The tree does not provide full image streaming yet but I'd like to
discuss the approach taken in the code.  Here are the main points:

The image streaming API is available through HMP and QMP commands.  When
streaming is started on a block device a coroutine is created to do the
background I/O work.  The coroutine can be cancelled.

While the coroutine copies data from the backing file into the image
file, the guest may be performing I/O to the image file.  Guest reads do
not conflict with streaming but guest writes require special handling.
If the guest writes to a region of the image file that we are currently
copying, then there is the potential to clobber the guest write with old
data from the backing file.

Previously I solved this in a QED-specific way by taking advantage of
the serialization of allocating write requests.  In order to do this
generically we need to track in-flight requests and have the ability to
queue I/O.  Guest writes that affect an in-flight streaming copy
operation must wait for that operation to complete before being issued.
Streaming copy operations must skip overlapping regions of guest writes.

One big difference to the QED image streaming implementation is that
this generic implementation is not based on copy-on-read operations.
Instead we do a sequence of bdrv_is_allocated() to find regions for
streaming, followed by bdrv_co_read() and bdrv_co_write() in order to
populate the image file.

It turns out that generic copy-on-read is not an attractive operation
because it requires using bounce buffers for every request.  Kevin
pointed out the case where a guest performs a read and pokes the data
buffer before the read completes, copy-on-read would write out the
modified memory into the image file unless we use a bounce buffer.

There are a few pieces missing in my tree, which have mostly been solved
in other places and just need to be reused:
1. Arbitration between guest and streaming requests (this is the only
   real new thing).
2. Efficient zero handling (skip writing those regions or mark them as
   zero clusters).
3. Queuing/dependencies when arbitration decides a request must wait.
   I'm taking a look at reusing Zhi Yong's block queue.
4. Rate-limiting to ensure streaming I/O does not impact the guest.
   Already exists in the QED-specific patches, it may make sense to
   extract common code that both migration and the block layer can use.

Ideas or questions?

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]