[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] 3.1: second invocation of migrate crashes qemu
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] 3.1: second invocation of migrate crashes qemu |
Date: |
Mon, 21 Jan 2019 16:55:53 +0100 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
Am 18.01.2019 um 16:57 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (address@hidden) wrote:
> > Am 14.01.2019 um 11:51 hat Dr. David Alan Gilbert geschrieben:
> > > * Michael Tokarev (address@hidden) wrote:
> > > > $ qemu-system-x86_64 -monitor stdio -hda foo.img
> > > > QEMU 3.1.0 monitor - type 'help' for more information
> > > > (qemu) stop
> > > > (qemu) migrate "exec:cat >/dev/null"
> > > > (qemu) migrate "exec:cat >/dev/null"
> > > > qemu-system-x86_64: /build/qemu/qemu-3.1/block.c:4647:
> > > > bdrv_inactivate_recurse: Assertion `!(bs->open_flags &
> > > > BDRV_O_INACTIVE)' failed.
> > > > Aborted
> > >
> > > And on head as well; it only happens if the 1st migrate is succesful;
> > > if it got cancelled the 2nd one works, so it's not too bad.
> > >
> > > I suspect the problem here is all around locking/ownership - the block
> > > devices get shutdown at the end of migration since the assumption is
> > > that the other end has them open now and we had better release them.
> >
> > Yes, only "cont" gets control back to the source VM.
> >
> > I think we really should limit the possible monitor commands in the
> > postmigrate status, and possibly provide a way to get back to the
> > regular paused state (which means getting back control of the resources)
> > without resuming the VM first.
>
> This error is a little interesting if you'd done something like:
>
>
> src:
> stop
> migrate
>
> dst:
> <kill qemu for some reason>
> start a new qemu
>
> src:
> migrate
>
> Now that used to work (safely) - note we've not started
> a VM succesfully anywhere else.
>
> Now the source refuses to let that happen - with a rather
> nasty abort.
Essentially it's another effect of the problem that migration has always
lacked a proper model of ownership transfer. And it's still treating
this as a block layer problem rather than making it a core concept of
migration as it should.
We can stack another one-off fix on top, and get back control of the
block devices automatically on a second 'migrate'. But it feels like a
hack and not like VMs had a properly designed and respected state
machine.
Kevin