qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 3/4] savevm: fix savevm after migration


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-block] [PATCH 3/4] savevm: fix savevm after migration
Date: Wed, 29 Mar 2017 16:29:42 +0100
User-agent: Mutt/1.8.0 (2017-02-23)

* Paolo Bonzini (address@hidden) wrote:
> 
> 
> On 28/03/2017 15:16, Vladimir Sementsov-Ogievskiy wrote:
> > 28.03.2017 15:09, Kevin Wolf wrote:
> >> Am 28.03.2017 um 13:13 hat Dr. David Alan Gilbert geschrieben:
> >>> * Kevin Wolf (address@hidden) wrote:
> >>>> Am 28.03.2017 um 12:55 hat Dr. David Alan Gilbert geschrieben:
> >>>>> * Kevin Wolf (address@hidden) wrote:
> >>>>>> Am 25.02.2017 um 20:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>>>>> After migration all drives are inactive and savevm will fail with
> >>>>>>>
> >>>>>>> qemu-kvm: block/io.c:1406: bdrv_co_do_pwritev:
> >>>>>>>     Assertion `!(bs->open_flags & 0x0800)' failed.
> >>>>>>>
> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy
> >>>>>>> <address@hidden>
> >>>>>> What's the exact state you're in? I tried to reproduce this, but just
> >>>>>> doing a live migration and then savevm on the destination works
> >>>>>> fine for
> >>>>>> me.
> >>>>>>
> >>>>>> Hm... Or do you mean on the source? In that case, I think the
> >>>>>> operation
> >>>>>> must fail, but of course more gracefully than now.
> >>>>>>
> >>>>>> Actually, the question that you're asking implicitly here is how the
> >>>>>> source qemu process should be "reactivated" after a failed migration.
> >>>>>> Currently, as far as I know, this is only with issuing a "cont"
> >>>>>> command.
> >>>>>> It might make sense to provide a way to get control without
> >>>>>> resuming the
> >>>>>> VM, but I doubt that adding automatic resume to every QMP command
> >>>>>> is the
> >>>>>> right way to achieve it.
> >>>>>>
> >>>>>> Dave, Juan, what do you think?
> >>>>> I'd only ever really thought of 'cont' or retrying the migration.
> >>>>> However, it does make sense to me that you might want to do a savevm
> >>>>> instead; if you can't migrate then perhaps a savevm is the best you
> >>>>> can do before your machine dies.  Are there any other things that
> >>>>> should be allowed?
> >>>> I think we need to ask the other way round: Any reason _not_ to allow
> >>>> certain operations that you can normally perform on a stopped VM?
> >>>>
> >>>>> We would want to be careful not to accidentally reactivate the disks
> >>>>> on the source after what was actually a succesful migration.
> >>>> Yes, that's exactly my concern, even with savevm. That's why I
> >>>> suggested
> >>>> we could have a 'cont'-like thing that just gets back control of the
> >>>> images and moves into the normal paused state, but doesn't immediately
> >>>> resume the actual VM.
> >>> OK, lets say we had that block-reactivate (for want of a better name),
> >>> how would we stop everything asserting if the user tried to do it
> >>> before they'd run block-reactivate?
> >> We would have to add checks to the monitor commands that assume that the
> >> image is activated and error out if it isn't.
> >>
> >> Maybe just adding the check to blk_is_available() would be enough, but
> >> we'd have to check carefully whether it covers all cases and causes no
> >> false positives.
> >>
> >> By the way, I wouldn't call this 'block-reactivate' because I don't
> >> think this should be a block-specific command. It's a VM lifecycle
> >> command that switches from a postmigrate state (that assumes we have no
> >> control over the VM's resources any more) to a paused state (where we do
> >> have this control). Maybe something like 'migration-abort'.
> > 
> > 'abort' is not very good too I think. migration is completed, nothing to
> > abort.. (may be successful migration to file for suspend, some kind of
> > vm cloning, etc)
> 
> There is already migrate_cancel.  Does it make sense to make it
> reactivate fds if migration is completed?

It's potentially racy to do that.
Imagine if your migration is almost finished and you issue a migrate_cancel,
what happens?
Maybe it cancelled it.
Maybe it just completed in time - and you really better not be accessing
the disks on the source unless you're sure the destination isn't running.

Dave

> Paolo
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]