qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if usin


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Date: Tue, 10 Apr 2018 10:18:48 +0200
User-agent: Mutt/1.9.1 (2017-09-22)

Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben:
> On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote:
> > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > >   a) Start migration
> > >   b) Migration gets to completion point
> > >   c) Destination is still paused
> > >   d) Libvirt is restarted on the source
> > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > >      the destination won't be started)
> > >   f) It now tries to resume the qemu on the source
> > > 
> > > (f) fails because (b) caused the locks to be taken on the destination;
> > > hence this patch stops doing that.  It's a case we don't really think
> > > about - i.e. that the migration has actually completed and all the data
> > > is on the destination, but libvirt decides for some other reason to
> > > abandon migration.
> > 
> > If you do remember correctly, that scenario doesn't feel tricky at all.
> > libvirt needs to quit the destination qemu, which will inactivate the
> > images on the destination and release the lock, and then it can continue
> > the source.
> > 
> > In fact, this is so straightforward that I wonder what else libvirt is
> > doing. Is the destination qemu only shut down after trying to continue
> > the source? That would be libvirt using the wrong order of steps.
> 
> There's no connection between the two libvirt daemons in the case we're
> talking about so they can't really synchronize the actions. The
> destination daemon will kill the new QEMU process and the source will
> resume the old one, but the order is completely random.

Hm, okay...

> > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > is that if this now under the control of the management layer then we
> > > should stop asserting when the block devices aren't in the expected
> > > state and just cleanly fail the command instead.
> > 
> > Requiring an explicit 'block-activate' on the destination would be an
> > incompatible change, so you would have to introduce a new option for
> > that. 'block-inactivate' on the source feels a bit simpler.
> 
> As I said in another email, the explicit block-activate command could
> depend on a migration capability similarly to how pre-switchover state
> works.

Yeah, that's exactly the thing that we wouldn't need if we could use
'block-inactivate' on the source instead. It feels a bit wrong to
design a more involved QEMU interface around the libvirt internals, but
as long as we implement both sides for symmetry and libvirt just happens
to pick the destination side for now, I think it's okay.

By the way, are block devices the only thing that need to be explicitly
activated? For example, what about qemu_announce_self() for network
cards, do we need to delay that, too?

In any case, I think this patch needs to be reverted for 2.12 because
it's wrong, and then we can create the proper solution in the 2.13
timefrage.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]