Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migrati

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migrati

From:	Anthony Liguori
Subject:	Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
Date:	Mon, 25 Jul 2011 18:23:17 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110516 Lightning/1.0b2 Thunderbird/3.1.10

On 07/25/2011 04:10 PM, Paolo Bonzini wrote:

On Thu, Jun 30, 2011 at 17:46, Paolo Bonzini<address@hidden>  wrote:

With the current migration format, VMS_STRUCTs with subsections
are ambiguous.  The protocol cannot tell whether a 0x5 byte after
the VMS_STRUCT is a subsection or part of the parent data stream.
In the past QEMU assumed it was always a part of a subsection; after
commit eb60260 (savevm: fix corruption in vmstate_subsection_load(),
2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
defined.

Unfortunately, this means that if a destination has no subsections
defined for the struct, it will happily read subsection data into
its own fields.  And if you are "lucky" enough to stumble on a
zero byte at the right time, it will be interpreted as QEMU_VM_EOF
and migration will be interrupted with half-loaded state.

There is no way out of this except defining an incompatible
migration protocol.  Not-so-long-term we should really try to define
one that is not a joke, but the bug is serious so we need a solution
for 0.15.  A sentinel at the end of embedded structs does remove the
ambiguity.

Of course, this can be restricted to new machine models, and this
is what the patch series does.  (And note that only patch 3 is specific
to the short-term solution, everything else is entirely generic).

Untested beyond compilation.


I have now tested this series (exactly as sent) both by examining
manually the differences between the two formats on the same guest
state, and by a mix of saves/restores (new on new, 0.14 on new
pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL).  It
always does what is expected.

Michael Tsirkin objected that the format should be passed as a
parameter in the migrate command.  I kind of agree, however since this
is a real bug you would need to bump the default for new machine
types, and this default would still go in the QEMUMachine struct like
I am doing.  So I consider the two settings to be orthogonal.  Also,
the alternative requires changes to the whole management stack and if
the default is not changed it imposes a broken format unless you
update the management tools.  Clearly much less bang for the buck.

I think this is ready to go into 0.15.


I'll take a look for 0.15.

The bug happens when migrating
to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a
floppy.  The media changed subsection is almost always included, and
this causes problems when migrating to 0.14 which didn't have any
subsection for the floppy device.  While QEMU support for migration to
old version admittedly depends on luck, this isn't true of certain
downstreams :) which would like to have an unambiguous migration
format.

So this got me thinking about where we're at with migration and where weneed to go.

I actually think there might be a reasonable path forward if we attackthe problem differently than we have so far.


== Today ==

Today we only support generating the latest serialization of devices.To increase the probability of the latest version working on olderversions of QEMU, we strategically omit fields that we know can safelybe omitted with older versions (subsections). More than likely,migrating new to old won't work.

Migrating old to new is more likely to work. We version each section inorder to be able to identify when we're dealing with old.

But all of this logic lives in one of two forms. Either as asavevm/loadvm callback that takes a QEMUFile and writes byteserialization to the stream in an open way (usually big endian) orencoded declaratively in a VMState section.


== What we need ==

We need to decompose migration into three different problems: 1)serializing device state 2) transforming the device model in order tosatisfy forwards and backwards compatibility 3) encoding the serializeddevice model on the wire.


We also need a way to future proof ourselves.

== What we can do ==

1) Add migration capabilities to future proof ourselves. I think thesimplest way this would work is to have a 'query-migration-capabilities'command that returned a bitmask of supported migration features. Ithink we also introduce a 'set-migration-capabilities' command that canmask some of the supported features.

A management tool would query-migration features on the source anddestination, take the intersection of the two masks, and set that maskon both the source and destination.

Lack of support for these commands indicates a mask of zero which is theprotocol we offer today.

2) Switch to a visitor model to serialize device state. This involvesconverting any occurance of:


qemu_put_be32(f, port->guest_connected);

To:

visit_type_u32(v, "guest_connected", &port->guest_connected, &local_err);

It's 100% mechanical and makes absolutely no logic change. It worksequally well with legacy and VMstate migration handlers.


3) Add a Visitor class that operates on QEMUFile.

At this state, we can migrate to data structures. That means we canmigrate to QEMUFile, QObjects, or JSON. We could change the protocol atthis stage to something that was still binary but had section sizes andthings of that nature.


But we shouldn't stop here.

4) Compatibility logic should be extracted from the savevm functions andVMstate functions into separate functions that take a data structure.Basically, we want to have something roughly equivalent to:

QObject *e1000_migration_compatibility(QObject *src, int src_version,int dst_version);

We can have lots of helpers that reuse the VMstate declarative stuff todo this but this should be registered independent of the mainserialization handler.

This moves us to a model where we always generate the latestserialization format, and then have specific ways to convert to oldermechanisms. It allows us to do very big backwards compatibility stepslike convert the state of one device into two separate devices (becausewe're just dealing with in-memory data structures).

It's this step that lets us truly support compatibility with migration.The good news is, it doesn't have to be all or nothing. Since wealways already generate the latest serialization format, the existingcode only deals with migrating older versions to the latest which issomething that isn't all that important.

So if we did this in 1.0, we could have a single function that convertedthe 1.0 device model to 1.1 and vice versa, and we'd be fine. Wewouldn't have to touch 200 devices to do this.

5) Once we're here, we can implement the next 5-year format. That couldbe ASN.1 and be bidirectional or whatever makes the most sense. Wecould support 50 formats if we wanted to. As long as the transport isdistinct from the serialization and compat routines, it really doesn'tmatter.


Regards,

Anthony Liguori


Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Paolo Bonzini, 2011/07/25
- Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Anthony Liguori <=
  - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Daniel P. Berrange, 2011/07/26
  - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Stefan Hajnoczi, 2011/07/26
    - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Stefan Hajnoczi, 2011/07/26
    - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Anthony Liguori, 2011/07/26
  - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Juan Quintela, 2011/07/26
    - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Anthony Liguori, 2011/07/26
    - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Juan Quintela, 2011/07/26
    - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Anthony Liguori, 2011/07/26
    - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Peter Maydell, 2011/07/26
    - Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format, Anthony Liguori, 2011/07/26

Prev by Date: [Qemu-devel] [PATCH 6/7] sd: add SD Host Controller (SDHCI) emulation
Next by Date: [Qemu-devel] KVM call agenda for July 26
Previous by thread: Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
Next by thread: Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
Index(es):
- Date
- Thread