qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration


From: Avihai Horon
Subject: Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration
Date: Wed, 8 Feb 2023 15:08:15 +0200
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1


On 08/02/2023 0:34, Alex Williamson wrote:
External email: Use caution opening links or attachments


On Mon, 6 Feb 2023 14:31:30 +0200
Avihai Horon <avihaih@nvidia.com> wrote:

Currently VFIO migration doesn't implement some kind of intermediate
quiescent state in which P2P DMAs are quiesced before stopping or
running the device. This can cause problems in multi-device migration
where the devices are doing P2P DMAs, since the devices are not stopped
together at the same time.

Until such support is added, block migration of multiple devices.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
  include/hw/vfio/vfio-common.h |  2 ++
  hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++
  hw/vfio/migration.c           |  6 +++++
  3 files changed, 59 insertions(+)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e573f5a9f1..56b1683824 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
  extern VFIOGroupList vfio_group_list;

  bool vfio_mig_active(void);
+int vfio_block_multiple_devices_migration(Error **errp);
+void vfio_unblock_multiple_devices_migration(void);
  int64_t vfio_mig_bytes_transferred(void);

  #ifdef CONFIG_LINUX
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3a35f4afad..01db41b735 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -41,6 +41,7 @@
  #include "qapi/error.h"
  #include "migration/migration.h"
  #include "migration/misc.h"
+#include "migration/blocker.h"
  #include "sysemu/tpm.h"

  VFIOGroupList vfio_group_list =
@@ -337,6 +338,56 @@ bool vfio_mig_active(void)
      return true;
  }

+Error *multiple_devices_migration_blocker;
+
+static unsigned int vfio_migratable_device_num(void)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+    unsigned int device_num = 0;
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->migration) {
+                device_num++;
+            }
+        }
+    }
+
+    return device_num;
+}
+
+int vfio_block_multiple_devices_migration(Error **errp)
+{
+    int ret;
+
+    if (vfio_migratable_device_num() != 2) {
+        return 0;
+    }
+
+    error_setg(&multiple_devices_migration_blocker,
+               "Migration is currently not supported with multiple "
+               "VFIO devices");
+    ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
+    if (ret < 0) {
+        error_free(multiple_devices_migration_blocker);
+        multiple_devices_migration_blocker = NULL;
+    }
+
+    return ret;
+}
+
+void vfio_unblock_multiple_devices_migration(void)
+{
+    if (vfio_migratable_device_num() != 2) {
+        return;
+    }
+
+    migrate_del_blocker(multiple_devices_migration_blocker);
+    error_free(multiple_devices_migration_blocker);
+    multiple_devices_migration_blocker = NULL;
+}
A couple awkward things here.  First I wish we could do something
cleaner or more intuitive than the != 2 test.  I get that we're trying
to do this on the addition of the 2nd device supporting migration, or
the removal of the next to last device independent of all other devices,
but I wonder if it wouldn't be better to remove the multiple-device
blocker after migration is torn down for the device so we can test
device >1 or ==1 in combination with whether
multiple_devices_migration_blocker is NULL.

Which comes to the second awkwardness, if we fail to add the blocker we
free and clear the blocker, but when we tear down the device due to that
failure we'll remove the blocker that doesn't exist, free NULL, and
clear it again.  Thanks to the glib slist the migration blocker is
using, I think that all works, but I'd rather not be dependent on that
implementation to avoid a segfault here.  Incorporating a test of
multiple_devices_migration_blocker as above would avoid this too.

You mean something like this?

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3a35f4afad..f3e08eff58 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c

[...]

+int vfio_block_multiple_devices_migration(Error **errp)
+{
+    int ret;
+
+    if (vfio_migratable_device_num() <= 1 ||
+        multiple_devices_migration_blocker) {
+        return 0;
+    }
+
+    error_setg(&multiple_devices_migration_blocker,
+               "Migration is currently not supported with multiple "
+               "VFIO devices");
+    ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
+    if (ret < 0) {
+        error_free(multiple_devices_migration_blocker);
+        multiple_devices_migration_blocker = NULL;
+    }
+
+    return ret;
+}
+
+void vfio_unblock_multiple_devices_migration(void)
+{
+    if (vfio_migratable_device_num() > 1 ||
+        !multiple_devices_migration_blocker) {
+        return;
+    }
+
+    migrate_del_blocker(multiple_devices_migration_blocker);
+    error_free(multiple_devices_migration_blocker);
+    multiple_devices_migration_blocker = NULL;
+}
+
 static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
 {
     VFIOGroup *group;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 552c2313b2..15b446c0ec 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -880,6 +880,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
         goto add_blocker;
     }

+    ret = vfio_block_multiple_devices_migration(errp);
+    if (ret) {
+        return ret;
+    }
+
     trace_vfio_migration_probe(vbasedev->name, info->index);
     g_free(info);
     return 0;
@@ -906,6 +911,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
         qemu_del_vm_change_state_handler(migration->vm_state);
         unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
         vfio_migration_exit(vbasedev);
+        vfio_unblock_multiple_devices_migration();
     }

     if (vbasedev->migration_blocker) {


Maybe also negate the if conditions and put the add/remove blocker code inside it? Is it more readable this way?
E.g.:

+int vfio_block_multiple_devices_migration(Error **errp)
+{
+    int ret = 0;
+
+    if (vfio_migratable_device_num() > 1 &&
+        !multiple_devices_migration_blocker) {
+        error_setg(&multiple_devices_migration_blocker,
+                   "Migration is currently not supported with multiple "
+                   "VFIO devices");
+        ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
+        if (ret < 0) {
+            error_free(multiple_devices_migration_blocker);
+            multiple_devices_migration_blocker = NULL;
+        }
+    }
+
+    return ret;
+}
+
+void vfio_unblock_multiple_devices_migration(void)
+{
+    if (vfio_migratable_device_num() <= 1 &&
+        multiple_devices_migration_blocker) {
+        migrate_del_blocker(multiple_devices_migration_blocker);
+        error_free(multiple_devices_migration_blocker);
+        multiple_devices_migration_blocker = NULL;
+    }
+}

Thanks.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]