[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: Live migration protocol, device features, ABIs and ot

From: Anthony Liguori
Subject: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
Date: Mon, 23 Nov 2009 10:44:04 -0600
User-agent: Thunderbird (X11/20090825)

Juan Quintela wrote:
you can weasel the way you want (I can also do it).

Customer had: 5.4 <-> 5.4 migration working (suboptimally)
Now appears 5.4.1 that works best with migration.  But he want to do the
migration in two steps:

migrate from qemu 5.4 -> 5.4.1, and be able to migrate back if he don't
like it.

At some point, he will migrate to 5.4.1 knowing that it lost backward
migration.  Think of a cluster of machines here, and you just add a
5.4.1 machine into the mix, and what this to work while you haven't
changed _all_ the machines.

If I'm a customer and you introduce this sort of change in a .z release, I would certainly want to know about it and have control over it.

I don't want to transparently migrate from 5.4.1 to 5.4.0 and have my guest's time start drifting. I specifically want that to fail.

If I wanted to support both models because I didn't care, then I would start with -M 5.4.0 on all of my nodes. I know you don't have a -M 5.4.1 and -M 5.4.0 but if you're introducing these sort of changes, you really should.

However, if we rely on certain guest behavior, then it blows up the
testing matrix because now we have to test every guest with every
workload to see whether it works with migration.  It's a slippery
slope that's hard to get off once you start.

I know :( But life sometimes don't agree with you.  Notice that I
understand that our problem is different that upstream one.  Our prolbem
is more in migrating from 0.11.0 -> 0.11.1, and be able to go back.
Changes in the savevm are only introduced if there is no other solution.
But we want to be able to get the 0.11.0 behaviour in 0.11.1, because we
have a mixed environment.  Requesting to upgrade all the hosts at the
same time is not going to fly with any BOFH :)

You've made a policy decision. As a user, I really don't like that policy decision and it makes me want to make sure that we upgrade all of our hosts at once to avoid this problem. Of course, I'm a control freak and I'm particularly concerned about time drift issues as that's been consuming a bit of my time lately.

But if you know substitute qemu-0.11 and qemu-0.12 for RHEL5.4 and
RHEL5.4.1, you will see that the code bases are going to be really,
really similar.  And if any savevm format is changed, it is because
there are no other solution.
In our own stable branch, we do not introduce any savevm changes.  I
would recommend the same policy for RHEL :-)

Except if we found a bug, and there are no other solution.  That is what
we try to do.  And we would not change the format for a new feature, but
what happens if it was a bug that a field is really missing?

Can we reasonably support a guest that doesn't have this older field? If the answer is "yes", then it's a feature that can be delayed until the next release.

You may be willing to expose this to your users but as an upstream
policy, I'm very opposed to it.  You're breaking the contract of
migration by changing the guests behavior from underneath it.

The layer inside me:
- You are lying when you told me that qemu-0.11 -M pc-0.10 gives me a
  pc-0.10 like machine.  The savevm format is different.

(after talking about contracts, I couldn't resist)

That's a bug that we need to fix.

I could make more examples to you.  But that would just make the
discussion longer.  What we have here is:

- migration beteween 0.11.0 -> 0.11.0 works some way
- I want "that very way" between 0.11.1 -> 0.11.0.

Not a problem as long as we don't introduce features in the stable branch.

A better approach would be having an option to "force" a migration
across incompatible versions.  I think such an option would be pretty
dangerous to offer but at least it puts the decision in the hands of
the management software where it belongs.

The difference is where you put things.  In the source (newer code) or
in the target (older code).  By definition, once that you have changed
something, you can change it to be backward compatible.  What is a bit
more difficult is to take the time machine, go to the past, and change
5.4 to be compatible with 5.4.1. (*)

The problem here isn't migration, it's what you've decided to backport into your stable branch.

Note that the discussion we're having isn't about backporting pvclock to qemu or qemu/kvm's stable branch. We're not going to change the migration protocol in upstream to support a decision that we haven't actually made.

And from an upstream position, I would oppose implementing the pvclock change in the stable branch exactly because of the problems it would create with live migration.


Anthony Liguori

reply via email to

[Prev in Thread] Current Thread [Next in Thread]