qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] live block copy/stream/snapshot discussion


From: Kevin Wolf
Subject: Re: [Qemu-devel] live block copy/stream/snapshot discussion
Date: Tue, 12 Jul 2011 18:10:26 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc15 Thunderbird/3.1.10

Am 12.07.2011 17:45, schrieb Stefan Hajnoczi:
>>>> Image streaming API
>>>> ===================
>>>>
>>>> For leaf images with copy-on-read semantics, the stream commands allow the 
>>>> user
>>>> to populate local blocks by manually streaming them from the backing image.
>>>> Once all blocks have been streamed, the dependency on the original backing
>>>> image can be removed.  Therefore, stream commands can be used to implement
>>>> post-copy live block migration and rapid deployment.
>>>>
>>>> The block_stream command can be used to stream a single cluster, to
>>>> start streaming the entire device, and to cancel an active stream.  It
>>>> is easiest to allow the block_stream command to manage streaming for the
>>>> entire device but a managent tool could use single cluster mode to
>>>> throttle the I/O rate.
>>
>> As discussed earlier, having the management send requests for each
>> single cluster doesn't make any sense at all. It wouldn't only throttle
>> the I/O rate but bring it down to a level that makes it unusable. What
>> you really want is to allow the management to give us a range (offset +
>> length) that qemu should stream.
> 
> I feel that an iteration interface is problematic whether the
> management tool or QEMU decide what to stream.  Let's have just the
> background streaming operation.
> 
> The problem with byte ranges is two-fold.  The management tool doesn't
> know which regions of the image are allocated so it may do a lot of
> nop calls to already-allocated regions with no intelligence as to
> where the next sensible offset for streaming is.  Secondly, because
> the progress and performance of image streaming depend largely on
> whether or not clusters are allocated (it is very fast when a cluster
> is already allocated and we have no work to do), offsets are bad
> indicators of progress to the user.  I think it's best not to expose
> these details to the management tool at all.
> 
> The only reason for the iteration interface was to punt I/O throttling
> to the management tool.  I think it would be easier to just throttle
> inside the streaming function.
> 
> Kevin: Are you happy with dropping the iteration interface?
> Adam: Is there a libvirt requirement for iteration or could we support
> background copy only?

Okay, works for me.

>>>> The command synopses are as follows:
>>>>
>>>> block_stream
>>>> ------------
>>>>
>>>> Copy data from a backing file into a block device.
>>>>
>>>> If the optional 'all' argument is true, this operation is performed in the
>>>> background until the entire backing file has been copied.  The status of
>>>> ongoing block_stream operations can be checked with query-block-stream.
>>
>> Not sure if it's a good idea to use a bool argument to turn a command
>> into its opposite. I think having a separate command for stopping would
>> be cleaner. Something for the QMP folks to decide, though.
> 
> git branch new_branch
> git branch -D new_branch
> 
> Makes sense to me :)

I don't think you should compare a command line option to a programming
interface. Having a git_create_branch(const char *name, bool delete)
would really look strange. Anyway, probably a matter of taste.

A hint that separate commands would make sense is that the stop command
won't need the other arguments that the start command gets ('all' and
'base').

>>>> Arguments:
>>>>
>>>> - all:    copy entire device (json-bool, optional)
>>>> - stop:   stop copying to device (json-bool, optional)
>>>> - device: device name (json-string)
>>>
>>> It must be possible to specify backing file that will be
>>> active after streaming finishes (data from that file will not
>>> be streamed into active file, of course).
>>
>> Yes, I think the common base image belongs here.
> 
> Right.  We need to specify it by filename:
> 
>   - base: filename of base file (json-string, optional)
> 
>   Sectors are not copied from the base file and its backing file
>   chain.  The following describes this feature:
>     Before: base <- sn1 <- sn2 <- sn3 <- vm.img
>     After:  base <- vm.img

Does this imply that a rebase -u happens always after completion?

>> With all = false, where does the streaming begin?
> 
> Streaming begins at the start of the image.
> 
>> Do you have something like the "current streaming offset" in the state of 
>> each BlockDriverState?
> 
> Yes, there is a StreamState for each block device that has an
> in-progress operation.  The progress is saved between block_stream
> (without -a) invocations so the caller does not need to specify the
> streaming offset as an argument.
> 
> Thanks for pointing out these weaknesses in the documentation.  It
> should really be explained fully.

I think we also need to describe error cases. For example, what happens
if you try to start streaming while it's already in progress?

>>>> Return:
>>>>
>>>> - device: device name (json-string)
>>>> - len:    size of the device, in bytes (json-int)
>>>> - offset: ending offset of the completed I/O, in bytes (json-int)
>>
>> So you only get the reply when the request has completed? With the
>> current monitor, this means that QMP is blocked while we stream, doesn't
>> it? How are you supposed to send the stop command then?
> 
> Incomplete documentation again, sorry.  The block_stream command
> behaves as follows:
> 
> 1. block_stream all returns immediately and the BLOCK_STREAM_COMPLETED
> event is raised when streaming completes either successfully or with
> an error.
> 
> 2. block_stream stop returns when the in-progress streaming operation
> has been safely stopped.
> 
> 3. block_stream returns when one iteration of streaming has completed.
> 
>> Two of three examples below have an empty return value instead, so they
>> are not compliant to this specification.
> 
> I will update the documentation, the non-all invocations do not return 
> anything.

Okay, then I don't understand what the 'offset' return value means. The
text says "offset of the completed I/O". If all=true immediately
returns, shouldn't it always be 0?

>> I find it rather disturbing that a command like 'change' has made it
>> into QMP... Anyway, I don't think this is really what we need.
>>
>> We have two switches to do. The first one happens before starting the
>> copy: Creating the copy, with the source as its backing file, and
>> switching to that. The monitor command to achieve this is snapshot_blkdev.
> 
> I don't think that creating image files in QEMU is going to work when
> running KVM with libvirt (SELinux).  The QEMU process does not have
> the ability to create new image files.  It needs at least a file
> descriptor to an empty file or maybe a file that has been created
> using qemu-img like I showed above.

Independent problem. We're really creating an external snapshot here, so
we should use the function for external snapshots. libvirt can
pre-create an empty image file, so that qemu will write the image format
data into it, but we have discussed this before.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]