[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: Live migration protocol, device features, ABIs and ot

From: Anthony Liguori
Subject: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
Date: Mon, 23 Nov 2009 08:49:23 -0600
User-agent: Thunderbird (X11/20090825)

Juan Quintela wrote:
Anthony Liguori <address@hidden> wrote:
Juan Quintela wrote:

I'm not at all convinced that you can downgrade the version of a
device without exposing a functional change to a guest.  In fact, I'm
pretty certain that it's provably impossible.  Please give a counter
example of where this mechanism would be safe.

The problem that we are having in RHEL just now is that there are two
new fields to make pvclock/kvmclock more exact (this is qemu-kvm tree):

        /* KVM pvclock msr */
        VMSTATE_UINT64_V(system_time_msr, CPUState, 12),
        VMSTATE_UINT64_V(wall_clock_msr, CPUState, 12),

Before we added that values to the state, we used whatever time the host
were using for that values (yes, we had drift).

But if we don't send that two values, we are not worse that we were
before adding that to the state.

But the effect is that after you migrate, you change behavior. In this case, you migrate a guest that isn't drifting and then after migration, you start drifting.

Changing guest behavior during migration means that the guest becomes part of the equation with respect to how well it behaves with this change. If we can prove a guest behaves exactly the same before and after migration, then assuming we're correct, we don't have to test migration with more than one guest. Practically speaking, testing with more guests is good because it uncovers new bugs.

However, if we rely on certain guest behavior, then it blows up the testing matrix because now we have to test every guest with every workload to see whether it works with migration. It's a slippery slope that's hard to get off once you start.

What is our problem here is (you can substitute qemu versions for RHEL
if it makes your feel better)

Client start with qemu 0.10, it has its image running here
so far so good

It just happens that appears qemu 0.12
He wants to test it, no problem, you can go from qemu 0.11 to qemu-0.12

But (and this is the big but), he wants to be sure that he can go back
to 0.11 if anything bad happens.  Then we want to start:

qemu-0.12 -M pc-0.11

and that this is able to migrate back to qemu-0.12.

Being able to save state with qemu-0.12 in qemu-0.11 format is quite
difficult (specially because we didn't even try).

But that's the real fix here.

But if you know substitute qemu-0.11 and qemu-0.12 for RHEL5.4 and
RHEL5.4.1, you will see that the code bases are going to be really,
really similar.  And if any savevm format is changed, it is because
there are no other solution.

In our own stable branch, we do not introduce any savevm changes. I would recommend the same policy for RHEL :-)

In the cases that we have had so far, this is feasible. I.e. the new
field just give a "more exact" behaviour, but not sending this new
value, just got the same behaviour than before.

You may be willing to expose this to your users but as an upstream policy, I'm very opposed to it. You're breaking the contract of migration by changing the guests behavior from underneath it.

If I'm a large scale virtualization deployment and I'm using live migration transparently to balance the load globally, it needs to be completely transparent to the guests running in the deployment. Failure needs to be very exact.

With the time drift example, you've introduced a policy into qemu that really belongs in the management layer. You've decided that changing guest behavior by introducing drift in pvclock is acceptable compared to the value of this one use-case.

A better approach would be having an option to "force" a migration across incompatible versions. I think such an option would be pretty dangerous to offer but at least it puts the decision in the hands of the management software where it belongs.


Anthony Liguori

reply via email to

[Prev in Thread] Current Thread [Next in Thread]