qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] 3.1: second invocation of migrate crashes qemu


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] 3.1: second invocation of migrate crashes qemu
Date: Mon, 21 Jan 2019 16:05:24 +0000
User-agent: Mutt/1.10.1 (2018-07-13)

* Kevin Wolf (address@hidden) wrote:
> Am 18.01.2019 um 16:57 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (address@hidden) wrote:
> > > Am 14.01.2019 um 11:51 hat Dr. David Alan Gilbert geschrieben:
> > > > * Michael Tokarev (address@hidden) wrote:
> > > > > $ qemu-system-x86_64 -monitor stdio -hda foo.img
> > > > > QEMU 3.1.0 monitor - type 'help' for more information
> > > > > (qemu) stop
> > > > > (qemu) migrate "exec:cat >/dev/null"
> > > > > (qemu) migrate "exec:cat >/dev/null"
> > > > > qemu-system-x86_64: /build/qemu/qemu-3.1/block.c:4647: 
> > > > > bdrv_inactivate_recurse: Assertion `!(bs->open_flags & 
> > > > > BDRV_O_INACTIVE)' failed.
> > > > > Aborted
> > > > 
> > > > And on head as well;  it only happens if the 1st migrate is succesful;
> > > > if it got cancelled the 2nd one works, so it's not too bad.
> > > > 
> > > > I suspect the problem here is all around locking/ownership - the block
> > > > devices get shutdown at the end of migration since the assumption is
> > > > that the other end has them open now and we had better release them.
> > > 
> > > Yes, only "cont" gets control back to the source VM.
> > > 
> > > I think we really should limit the possible monitor commands in the
> > > postmigrate status, and possibly provide a way to get back to the
> > > regular paused state (which means getting back control of the resources)
> > > without resuming the VM first.
> > 
> > This error is a little interesting if you'd done something like:
> > 
> > 
> >      src:
> >          stop
> >          migrate
> > 
> >      dst:
> >          <kill qemu for some reason>
> >          start a new qemu
> > 
> >      src:
> >          migrate
> > 
> > Now that used to work (safely) - note we've not started
> > a VM succesfully anywhere else.
> > 
> > Now the source refuses to let that happen - with a rather
> > nasty abort.
> 
> Essentially it's another effect of the problem that migration has always
> lacked a proper model of ownership transfer. And it's still treating
> this as a block layer problem rather than making it a core concept of
> migration as it should.
> 
> We can stack another one-off fix on top, and get back control of the
> block devices automatically on a second 'migrate'. But it feels like a
> hack and not like VMs had a properly designed and respected state
> machine.

Hmm; I don't like to get back to this argument because I think
we've got a perfectly servicable model that's implemented at higher
levels outside qemu, and the real problem is the block layer added
new assumptions about the semantics without checking they were really
true.
qemu only has the view from a single host; it takes the higher level
view from something like libvirt to have the view across multiple hosts
to understand who has the ownership when.

Dave

> Kevin
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]