qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vhost-user-fs: add capability to allow migration


From: Anton Kuchin
Subject: Re: [PATCH] vhost-user-fs: add capability to allow migration
Date: Thu, 19 Jan 2023 17:29:51 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 19/01/2023 16:30, Stefan Hajnoczi wrote:
On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
On 18/01/2023 17:52, Stefan Hajnoczi wrote:
On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
---
   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
   qapi/migration.json       |  7 ++++++-
   2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
   #include "hw/virtio/vhost-user-fs.h"
   #include "monitor/monitor.h"
   #include "sysemu/sysemu.h"
+#include "migration/migration.h"

   static const int user_feature_bits[] = {
       VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
       return &fs->vhost_dev;
   }

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+        error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+                     "state of backend to be preserved. If orchestrator can "
+                     "guarantee this (e.g. dst connects to the same backend "
+                     "instance or backend state is migrated) set 'vhost-user-fs' 
"
+                     "migration capability to true to enable migration.");
+        return -1;
+    }
+
+    return 0;
+}
+
   static const VMStateDescription vuf_vmstate = {
       .name = "vhost-user-fs",
-    .unmigratable = 1,
+    .minimum_version_id = 0,
+    .version_id = 0,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+   .pre_save = vhost_user_fs_pre_save,
   };
Will it be possible to extend this vmstate when virtiofsd adds support
for stateful migration without breaking migration compatibility?

If not, then I think a marker field should be added to the vmstate:
0 - stateless/reconnect migration (the approach you're adding in this patch)
1 - stateful migration (future virtiofsd feature)

When the field is 0 there are no further vmstate fields and we trust
that the destination vhost-user-fs server already has the necessary
state.

When the field is 1 there are additional vmstate fields that contain
the virtiofsd state.

The goal is for QEMU to support 3 migration modes, depending on the
vhost-user-fs server:
1. No migration support.
2. Stateless migration.
3. Stateful migration.
Sure. These vmstate fields are very generic and mandatory for any
virtio device. If in future more state can be transfer in migration
stream the vmstate can be extended with additional fields. This can
be done with new subsections and/or bumping version_id.
My concern is that the vmstate introduced in this patch may be
unusable when stateful migration is added. So additional compatibility
code will need to be introduced to make your stateless migration
continue working with extended vmstate.

By adding a marker field in this patch it should be possible to
continue using the same vmstate for stateless migration without adding
extra compatibility code in the future.
I understand, but this fields in vmstate just packs generic virtio
device state that is accessible by qemu. All additional data could be
added later by extra fields. I believe we couldn't pull off any type
of virtio device migration without transferring virtqueues so more
sophisticated types of migration would require adding more data and
not modification to this part of vmstate.


The main purpose of this patch is to allow update VM to newer version
of qemu via local migration without disruption to guest. And future
versions hopefully could pack more state from external environment
to migration stream.


   static Property vuf_properties[] = {
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
   #                    will be handled faster.  This is a performance feature 
and
   #                    should not affect the correctness of postcopy migration.
   #                    (since 7.1)
+# @vhost-user-fs: If enabled, the migration process will allow migration of
+#                 vhost-user-fs devices, this should be enabled only when
+#                 backend can preserve local FUSE state e.g. for qemu update
+#                 when dst reconects to the same endpoints after migration.
+#                 (since 8.0)
This is global but a guest can have multiple vhost-user-fs devices
connected to different servers.
AFAIK vhost-user requires unix socket and memory shared from guest so
devices can't be connected to different servers, just to different
endpoints on current host.
Different vhost-user-fs server software. vhost-user-fs is a protocol,
there can be multiple server implementations. These implementations
may have different capabilities (some may not support stateless
migration). So I think stateless migration should be a per-instance
setting, not global.

From qemu point of view we have no way of knowing which endpoints
implement stateless migration and if they can perform it at the
moment or need some additional setup to handle it, so after all the
final decision to migrate statelessly originates from the orchestrator
that knows backends states better.
Setting per-device flag can be more confusing because qemu can't know
or control backends states and capabilities. So I'd rather hand off
this task completely to the orchestrator and not try to split the
responsibility between it and qemu.


I would add a qdev property to the device instead of introducing a
migration capability. The property would enable "stateless migration".
When the property is not set, migration would be prohibited.
I did thought about that, but this is really not a property of device,
this is the capability of management software and applies to exactly one
particular migration process that it initiates. It should not persist
across migration or be otherwise stored in device.
I disagree. The vhost-user-fs server software must implement stateless
migration in order for this to work. For example,
https://gitlab.com/virtio-fs/virtiofsd doesn't support stateless
migration as far as I know.
Yes the implementation in backend is necessary, but this is backend
property and is not related at all to device in qemu. So management
software should know better what backends are capable of.
In fact I'm working now on support for this in rust viftiofsd (my rust
skills are not good enough yet do this quickly) and plan to propose
spec patch for exposing "reconnect" vhost user backend capability to
have standard way for orchestrators to check for it.


The idea here is that orchestrator can ensure destination qemu will
run on the same host, will reconnect to the same unix sockets and only
then sets the flag (because inside qemu we can't know anything about
the destination).
This is somewhat similar to ignore-shared migration capability when
qemu avoids saving and loading guest memory that is stores in shmem
because it will be picked up by destination process right where source
left it.

   #
   # Features:
   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -492,7 +497,7 @@
              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
              'validate-uuid', 'background-snapshot',
-           'zero-copy-send', 'postcopy-preempt'] }
+           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }

   ##
   # @MigrationCapabilityStatus:
--
2.34.1





reply via email to

[Prev in Thread] Current Thread [Next in Thread]