Re: [Qemu-devel] [RFC] More robust migration

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] More robust migration

From:	Jamie Lokier
Subject:	Re: [Qemu-devel] [RFC] More robust migration
Date:	Fri, 20 Feb 2009 16:37:05 +0000
User-agent:	Mutt/1.5.13 (2006-08-11)

Anthony Liguori wrote:
> >2. Introduce a length field to the header of each device.
> 
> IMHO, this would reduce robustness.  It's also difficult because of the 
> way savevm registration works.  You don't know how large a section is 
> until it's written and migration streams are not seekable.

The way HTTP deals with not knowing the size in advance is is to split
data into chunks, each chunk the size of a small write buffer, and a
chunk size is written in front of each one.  This allows storing
sections of binary data whose size isn't known in advance, but still
safely skip them.

> >This would allow to skip unknown (or unwanted) devices.
> 
> No good can come from this.  If you have an unknown section, you must 
> throw and error and stop the migration.  What if this is for a device 
> that the guest is interacting with?  The device just disappears after 
> migration?   All savevm state is state that affects the functionality of 
> a guest.  Throwing away this state will change the functionality of the 
> VM and migration should not affect guest functionality.

What if you're migrating from a snapshot made on a host with some
pass-through USB device to another host which cannot provide the same
device.  In that case I'd like the option for the guest to see the
device has disappeared.  Maybe it's stopped working (HPET), or maybe
it's unplugged (anything hot unpluggable).

That's preferable to not being able to use the snapshot at all,
effectively having to trash it.

For live migration this isn't much of an issue because you can unplug
the device before migrating, and you probably would like to be warned
of this before migrating anyway.

> >I know this imposes a bit of a challenge, because the length is not 
> >always known in advance, but one could overcome this (by using the 
> >buffer to patch in the length later for instance).
> 
> What are the use cases where you think this would be beneficial?  I 
> really see the change in semantics from the old way (throwing away 
> unknown sections) to the new way (requiring strict versioning and 
> validating all sections) as being a huge step toward robustness.

I've been upset at a "savevm" which I wrote with some past version of
QEMU that I couldn't load in a later version.  It wasn't obvious why,
just that it refused. And I didn't have the old version, or even know
which the old version was.  And even if I could have reconstructed the
old QEMU - I wanted to migrate to a newer version.  It's no fun having
to reconstruct a carefully primed guest snapshot test state from its
reboot, if that can be avoided.

I do think what you've done to make migration type-checking more
strict is very good.  Much better than running the wrong thing :-)

But I think there's a case for well-defined types of migration
flexibility too: migrating KVM to TCG and back (if you're moving
between hosts of different CPU features), migrating to new versions of
KVM and QEMU especially, changes of host device availability.

> >Also one could create some kind of (limited) upward compatibility, so 
> >older QEMU versions ignore additional, but optional fields in a device 
> >state (similar to the ext2 compatibility scheme). Maybe this could be 
> >done by an external converter program.
> 
> To me, ignoring is always a bad thing.  It's almost always going to be 
> unsafe.  Doesn't this decrease robustness by being less conservative?

I wonder if there's a use for something like ext2/3/4's capability
bits.  There are "required" capabilities and "backward compatible"
capabilities.  A backward compatible capability is only used when it's
explicitly designed in later versions to be compatible with older
versions.

I don't see a use-case immediately, it wouldn't be surprising though.

> My primary goal for migration is robustness.  I do not think it's a good 
> idea to support any circumstances that could introduce changes in guest 
> visible state during a live migration.

What about safe hotpluggable devices?

> Live migration is a critical feature for many production environments.  
> To be useful IMHO, it has to be bullet-proof.

I agree, and I think all that I've said is primarily about snapshots
rather than live migrations.

-- Jamie

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC] More robust migration, (continued)
- Re: [Qemu-devel] [RFC] More robust migration, Anthony Liguori, 2009/02/20
  - Re: [Qemu-devel] [RFC] More robust migration, Paul Brook, 2009/02/20
    - Re: [Qemu-devel] [RFC] More robust migration, Jamie Lokier, 2009/02/20
    - Re: [Qemu-devel] [RFC] More robust migration, Paul Brook, 2009/02/20
    - Re: [Qemu-devel] [RFC] More robust migration, Jamie Lokier, 2009/02/22
    - Re: [Qemu-devel] [RFC] More robust migration, Paul Brook, 2009/02/23
    - Re: [Qemu-devel] [RFC] More robust migration, Jamie Lokier, 2009/02/23
    - Re: [Qemu-devel] [RFC] More robust migration, Paul Brook, 2009/02/23
    - Re: [Qemu-devel] [RFC] More robust migration, Anthony Liguori, 2009/02/23
    - Re: [Qemu-devel] [RFC] More robust migration, Avi Kivity, 2009/02/24
  - Re: [Qemu-devel] [RFC] More robust migration, Jamie Lokier <=
    - Re: [Qemu-devel] [RFC] More robust migration, Anthony Liguori, 2009/02/20
- [Qemu-devel] Re: [RFC] More robust migration, Charles Duffy, 2009/02/20
  - Re: [Qemu-devel] Re: [RFC] More robust migration, Jamie Lokier, 2009/02/22

Prev by Date: Re: [Qemu-devel] [RFC] More robust migration
Next by Date: Re: [Qemu-devel] [RFC] More robust migration
Previous by thread: Re: [Qemu-devel] [RFC] More robust migration
Next by thread: Re: [Qemu-devel] [RFC] More robust migration
Index(es):
- Date
- Thread