[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 1/3] migration: Add documentation for backwards compatibli
|
From: |
Peter Xu |
|
Subject: |
Re: [PATCH v3 1/3] migration: Add documentation for backwards compatiblity |
|
Date: |
Tue, 16 May 2023 19:39:31 -0400 |
On Mon, May 15, 2023 at 10:31:59AM +0200, Juan Quintela wrote:
> State what are the requeriments to get migration working between qemu
> versions. And once there explain how one is supposed to implement a
> new feature/default value and not break migration.
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Message-Id: <20230511082701.12828-1-quintela@redhat.com>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
> docs/devel/migration.rst | 216 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 216 insertions(+)
>
> diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
> index 6f65c23b47..b4c4f3ec35 100644
> --- a/docs/devel/migration.rst
> +++ b/docs/devel/migration.rst
> @@ -142,6 +142,222 @@ General advice for device developers
> may be different on the destination. This can result in the
> device state being loaded into the wrong device.
>
> +How backwards compatibility works
> +---------------------------------
> +
> +When we do migration, we have to QEMU process: the source and the
s/to/two/, s/process/processes/
> +target. There are two cases, they are the same version or they are a
> +different version.
s/a different version/different versions/
> +The easy case is when they are the same version.
> +The difficult one is when they are different versions.
> +
> +There are two things that are different, but they have very similar
> +names and sometimes get confused:
(space)
> +- QEMU version
> +- machine version
It's normally called "machine type", so maybe use that? Or just "machine
version / machine type"?
> +
> +Let's start with a practical example, we start with:
> +
> +- qemu-system-x86_64 (v5.2), from now on qemu-5.2.
> +- qemu-system-x86_64 (v5.1), from now on qemu-5.1.
> +
> +Related to this are the "latest" machine types defined on each of
> +them:
> +
> +- pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2
> +- pc-q35-5.1 (newer one in qemu-5.1) from now on pc-5.1
> +
> +First of all, migration is only supposed to work if you use the same
> +machine type in both source and destination. The QEMU hardware
> +configuration needs to be the same also on source and destination.
> +Most aspects of the backend configuration can be changed at will,
> +except for a few cases where the backend features influence frontend
> +device feature exposure. But that is not relevant for this section.
> +
> +I am going to list the number of combinations that we can have. Let's
> +start with the trivial ones, QEMU is the same on source and
> +destination:
> +
> +1 - qemu-5.2 -M pc-5.2 -> migrates to -> qemu-5.2 -M pc-5.2
> +
> + This is the latest QEMU with the latest machine type.
> + This have to work, and if it doesn't work it is a bug.
> +
> +2 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1
> +
> + Exactly the same case than the previous one, but for 5.1.
> + Nothing to see here either.
> +
> +This are the easiest ones, we will not talk more about them in this
> +section.
> +
> +Now we start with the more interesting cases. Consider the case where
> +we have the same QEMU version in both sides (qemu-5.2) but we are using
> +the latest machine type for that version (pc-5.2) but one of an older
> +QEMU version, in this case pc-5.1.
> +
> +3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1
> +
> + It needs to use the definition of pc-5.1 and the devices as they
> + were configured on 5.1, but this should be easy in the sense that
> + both sides are the same QEMU and both sides have exactly the same
> + idea of what the pc-5.1 machine is.
> +
> +4 - qemu-5.1 -M pc-5.2 -> migrates to -> qemu-5.1 -M pc-5.2
> +
> + This combination is not possible as the qemu-5.1 doen't understand
> + pc-5.2 machine type. So nothing to worry here.
> +
> +Now it comes the interesting ones, when both QEMU processes are
> +different. Notice also that the machine type needs to be pc-5.1,
> +because we have the limitation than qemu-5.1 doesn't know pc-5.2. So
> +the possible cases are:
> +
> +5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1
> +
> + This migration is known as newer to older. We need to make sure
> + when we are developing 5.2 we need to take care about not to break
> + migration to qemu-5.1. Notice that we can't make updates to
> + qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is
> + in qemu-5.2 side to make the relevant changes.
> +
> +6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1
> +
> + This migration is known as older to newer. We need to make sure
> + than we are able to receive migrations from qemu-5.1. The problem is
> + similar to the previous one.
> +
> +If qemu-5.1 and qemu-5.2 were the same, there will not be any
> +compatibility problems. But the reason that we create qemu-5.2 is to
> +get new features, devices, defaults, etc.
> +
> +If we get a device that has a new feature, or change a default value,
> +we have a problem when we try to migrate between different QEMU
> +versions.
> +
> +So we need a way to tell qemu-5.2 that when we are using machine type
> +pc-5.1, it needs to **not** use the feature, to be able to migrate to
> +real qemu-5.1.
> +
> +And the equivalent part when migrating from qemu-5.1 to qemu-5.2.
> +qemu-5.2 has to expect that it is not going to get data for the new
> +feature, because qemu-5.1 doesn't know about it.
> +
> +How do we tell QEMU about these device feature changes? In
> +hw/core/machine.c:hw_compat_X_Y arrays.
> +
> +If we change a default value, we need to put back the old value on
> +that array. And the device, during initialization needs to look at
> +that array to see what value it needs to get for that feature. And
> +what are we going to put in that array, the value of a property.
> +
> +To create a property for a device, we need to use one of the
> +DEFINE_PROP_*() macros. See include/hw/qdev-properties.h to find the
> +macros that exist. With it, we set the default value for that
> +property, and that is what it is going to get in the latest released
> +version. But if we want a different value for a previous version, we
> +can change that in the hw_compat_X_Y arrays.
> +
> +hw_compat_X_Y is an array of registers that have the format:
> +
> +- name_device
> +- name_property
> +- value
> +
> +Let's see a practical example.
> +
> +In qemu-5.2 virtio-blk-device got multi queue support. This is a
> +change that is not backward compatible. In qemu-5.1 it has one
> +queue. In qemu-5.2 it has the same number of queues as the number of
> +cpus in the system.
> +
> +When we are doing migration, if we migrate from a device that has 4
> +queues to a device that have only one queue, we don't know where to
> +put the extra information for the other 3 queues, and we fail
> +migration.
> +
> +Similar problem when we migrate from qemu-5.1 that has only one queue
> +to qemu-5.2, we only sent information for one queue, but destination
> +has 4, and we have 3 queues that are not properly initialized and
> +anything can happen.
> +
> +So, how can we address this problem. Easy, just convince qemu-5.2
> +that when it is running pc-5.1, it needs to set the number of queues
> +for virtio-blk-devices to 1.
> +
> +That way we fix the cases 5 and 6.
> +
> +5 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.1 -M pc-5.1
> +
> + qemu-5.2 -M pc-5.1 sets number of queues to be 1.
> + qemu-5.1 -M pc-5.1 expects number of queues to be 1.
> +
> + correct. migration works.
> +
> +6 - qemu-5.1 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1
> +
> + qemu-5.1 -M pc-5.1 sets number of queues to be 1.
> + qemu-5.2 -M pc-5.1 expects number of queues to be 1.
> +
> + correct. migration works.
> +
> +And now the other interesting case, case 3. In this case we have:
> +
> +3 - qemu-5.2 -M pc-5.1 -> migrates to -> qemu-5.2 -M pc-5.1
> +
> + Here we have the same QEMU in both sides. So it doesn't matter a
> + lot if we have set the number of queues to 1 or not, because
> + they are the same.
> +
> + WRONG!
> +
> + Think what happens if we do one of this double migrations:
> +
> + A -> migrates -> B -> migrates -> C
> +
> + where:
> +
> + A: qemu-5.1 -M pc-5.1
> + B: qemu-5.2 -M pc-5.1
> + C: qemu-5.2 -M pc-5.1
> +
> + migration A -> B is case 6, so number of queues needs to be 1.
> +
> + migration B -> C is case 3, so we don't care. But actually we
> + care because we haven't started the guest in qemu-5.2, it came
> + migrated from qemu-5.1. So to be in the safe place, we need to
> + always use number of queues 1 when we are using pc-5.1.
> +
> +Now, how was this done in reality? The following commit shows how it
> +was done.
> +
> +commit 9445e1e15e66c19e42bea942ba810db28052cd05
> +Author: Stefan Hajnoczi <stefanha@redhat.com>
> +Date: Tue Aug 18 15:33:47 2020 +0100
> +
> + virtio-blk-pci: default num_queues to -smp N
> +
> +The relevant parts for migration are: ::
> +
> + @@ -1281,7 +1284,8 @@ static Property virtio_blk_properties[] = {
> + #endif
> + DEFINE_PROP_BIT("request-merging", VirtIOBlock,
> conf.request_merging, 0,
> + true),
> + - DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
> + + DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
> + + VIRTIO_BLK_AUTO_NUM_QUEUES),
> + DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256),
> +
> +It changes the default value of num_queues. But it fishes it for old
> +machine types to have the right value: ::
> +
> + @@ -31,6 +31,7 @@
> + GlobalProperty hw_compat_5_1[] = {
> + ...
> + + { "virtio-blk-device", "num-queues", "1"},
> + ...
> + };
> +
> +
This is definitely more detailed than I thought. :)
Acked-by: Peter Xu <peterx@redhat.com>
Thanks,
--
Peter Xu