[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 3/3] qapi: Introduce blockdev-query-group-snapsh
From: |
Jeff Cody |
Subject: |
Re: [Qemu-devel] [PATCH 3/3] qapi: Introduce blockdev-query-group-snapshot-failure |
Date: |
Tue, 21 Feb 2012 09:11:28 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20120131 Thunderbird/10.0 |
On 02/20/2012 12:48 PM, Eric Blake wrote:
> On 02/20/2012 10:31 AM, Jeff Cody wrote:
>> In the case of a failure in a group snapshot, it is possible for
>> multiple file image failures to occur - for instance, failure of
>> an original snapshot, and then failure of one or more of the
>> attempted reopens of the original.
>>
>> Knowing all of the file images which failed could be useful or
>> critical information, so this command returns a list of strings
>> containing the filenames of all failures from the last
>> invocation of blockdev-group-snapshot-sync.
>
> Meta-question:
>
> Suppose that the guest is running when we issue
> blockdev-group-snapshot-sync - in that case, qemu is responsible for
> pausing and then resuming the guest. On success, this makes sense. But
> what happens on failure?
The guest is not paused in blockdev-group-snapshot-sync; I don't think
that qemu should enforce pause/resume in the live snapshot commands.
>
> If we only fail at creating one snapshot, but successfully roll back the
> rest of the set, should the guest be resumed (as if the command had
> never been attempted), or should the guest be left paused?
>
> On the other hand, if we fail at creating one snapshot, as well as fail
> at rolling back, then that argues that we _cannot_ resume the guest,
> because we no longer have a block device open.
Is that really true, though? Depending on what drive failed, the guest
may still be runnable. It would be roughly equivalent to the guest as a
drive failure; a bad event, but not always fatal.
But, I think v2 of the patch may make this moot - I was talking with
Kevin, and he had some good ideas on how to do this without requiring a
close & reopen in the case of the snapshot failure; which means that we
shouldn't have to worry about the second scenario. I am going to
incorporate those changes into v2.
>
> This policy needs to be documented in one (or both) of the two new
> monitor commands, and we probably ought to make sure that if the guest
> is left paused where it had originally started as running, then an
> appropriate event is also emitted.
I agree, the documentation should make it clear what is going on - I
will add that to v2.
>
> For blockdev-snapshot-sync, libvirt was always pausing qemu before
> issuing the snapshot, then resuming afterwards; but now that we have the
> ability to make the set atomic, I'm debating about whether libvirt still
> needs to pause qemu, or whether it can now rely on qemu doing the right
> things about pausing and resuming as part of the snapshot command.
>
Again, it doesn't pause automatically, so that is up to libvirt. The
guest agent is also available to freeze the filesystem, if libvirt wants
to trust it (and it is running); if not, then libvirt can still issue a
pause/resume around the snapshot command (and libvirt may be in a better
position to decide what to do in case of failure, if it has some
knowledge of the drives that failed and how they are used).
[Qemu-devel] [PATCH 3/3] qapi: Introduce blockdev-query-group-snapshot-failure, Jeff Cody, 2012/02/20
[Qemu-devel] [PATCH 2/3] qapi: Introduce blockdev-group-snapshot-sync command, Jeff Cody, 2012/02/20