[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration
From: |
Alex Williamson |
Subject: |
Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration |
Date: |
Wed, 8 Feb 2023 10:22:42 -0700 |
On Wed, 8 Feb 2023 15:08:15 +0200
Avihai Horon <avihaih@nvidia.com> wrote:
> On 08/02/2023 0:34, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, 6 Feb 2023 14:31:30 +0200
> > Avihai Horon <avihaih@nvidia.com> wrote:
> >
> >> Currently VFIO migration doesn't implement some kind of intermediate
> >> quiescent state in which P2P DMAs are quiesced before stopping or
> >> running the device. This can cause problems in multi-device migration
> >> where the devices are doing P2P DMAs, since the devices are not stopped
> >> together at the same time.
> >>
> >> Until such support is added, block migration of multiple devices.
> >>
> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >> ---
> >> include/hw/vfio/vfio-common.h | 2 ++
> >> hw/vfio/common.c | 51 +++++++++++++++++++++++++++++++++++
> >> hw/vfio/migration.c | 6 +++++
> >> 3 files changed, 59 insertions(+)
> >>
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index e573f5a9f1..56b1683824 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup)
> >> VFIOGroupList;
> >> extern VFIOGroupList vfio_group_list;
> >>
> >> bool vfio_mig_active(void);
> >> +int vfio_block_multiple_devices_migration(Error **errp);
> >> +void vfio_unblock_multiple_devices_migration(void);
> >> int64_t vfio_mig_bytes_transferred(void);
> >>
> >> #ifdef CONFIG_LINUX
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 3a35f4afad..01db41b735 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -41,6 +41,7 @@
> >> #include "qapi/error.h"
> >> #include "migration/migration.h"
> >> #include "migration/misc.h"
> >> +#include "migration/blocker.h"
> >> #include "sysemu/tpm.h"
> >>
> >> VFIOGroupList vfio_group_list =
> >> @@ -337,6 +338,56 @@ bool vfio_mig_active(void)
> >> return true;
> >> }
> >>
> >> +Error *multiple_devices_migration_blocker;
> >> +
> >> +static unsigned int vfio_migratable_device_num(void)
> >> +{
> >> + VFIOGroup *group;
> >> + VFIODevice *vbasedev;
> >> + unsigned int device_num = 0;
> >> +
> >> + QLIST_FOREACH(group, &vfio_group_list, next) {
> >> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
> >> + if (vbasedev->migration) {
> >> + device_num++;
> >> + }
> >> + }
> >> + }
> >> +
> >> + return device_num;
> >> +}
> >> +
> >> +int vfio_block_multiple_devices_migration(Error **errp)
> >> +{
> >> + int ret;
> >> +
> >> + if (vfio_migratable_device_num() != 2) {
> >> + return 0;
> >> + }
> >> +
> >> + error_setg(&multiple_devices_migration_blocker,
> >> + "Migration is currently not supported with multiple "
> >> + "VFIO devices");
> >> + ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
> >> + if (ret < 0) {
> >> + error_free(multiple_devices_migration_blocker);
> >> + multiple_devices_migration_blocker = NULL;
> >> + }
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +void vfio_unblock_multiple_devices_migration(void)
> >> +{
> >> + if (vfio_migratable_device_num() != 2) {
> >> + return;
> >> + }
> >> +
> >> + migrate_del_blocker(multiple_devices_migration_blocker);
> >> + error_free(multiple_devices_migration_blocker);
> >> + multiple_devices_migration_blocker = NULL;
> >> +}
> > A couple awkward things here. First I wish we could do something
> > cleaner or more intuitive than the != 2 test. I get that we're trying
> > to do this on the addition of the 2nd device supporting migration, or
> > the removal of the next to last device independent of all other devices,
> > but I wonder if it wouldn't be better to remove the multiple-device
> > blocker after migration is torn down for the device so we can test
> > device >1 or ==1 in combination with whether
> > multiple_devices_migration_blocker is NULL.
> >
> > Which comes to the second awkwardness, if we fail to add the blocker we
> > free and clear the blocker, but when we tear down the device due to that
> > failure we'll remove the blocker that doesn't exist, free NULL, and
> > clear it again. Thanks to the glib slist the migration blocker is
> > using, I think that all works, but I'd rather not be dependent on that
> > implementation to avoid a segfault here. Incorporating a test of
> > multiple_devices_migration_blocker as above would avoid this too.
>
> You mean something like this?
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 3a35f4afad..f3e08eff58 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
>
> [...]
>
> +int vfio_block_multiple_devices_migration(Error **errp)
> +{
> + int ret;
> +
> + if (vfio_migratable_device_num() <= 1 ||
> + multiple_devices_migration_blocker) {
> + return 0;
> + }
Nit, I'd reverse the order of the test here and below, otherwise yes,
this is what I was thinking of.
> +
> + error_setg(&multiple_devices_migration_blocker,
> + "Migration is currently not supported with multiple "
> + "VFIO devices");
> + ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
> + if (ret < 0) {
> + error_free(multiple_devices_migration_blocker);
> + multiple_devices_migration_blocker = NULL;
> + }
> +
> + return ret;
> +}
> +
> +void vfio_unblock_multiple_devices_migration(void)
> +{
> + if (vfio_migratable_device_num() > 1 ||
> + !multiple_devices_migration_blocker) {
> + return;
> + }
> +
> + migrate_del_blocker(multiple_devices_migration_blocker);
> + error_free(multiple_devices_migration_blocker);
> + multiple_devices_migration_blocker = NULL;
> +}
> +
> static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
> {
> VFIOGroup *group;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 552c2313b2..15b446c0ec 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -880,6 +880,11 @@ int vfio_migration_probe(VFIODevice *vbasedev,
> Error **errp)
> goto add_blocker;
> }
>
> + ret = vfio_block_multiple_devices_migration(errp);
> + if (ret) {
> + return ret;
> + }
> +
> trace_vfio_migration_probe(vbasedev->name, info->index);
> g_free(info);
> return 0;
> @@ -906,6 +911,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
> qemu_del_vm_change_state_handler(migration->vm_state);
> unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
> vfio_migration_exit(vbasedev);
> + vfio_unblock_multiple_devices_migration();
> }
>
> if (vbasedev->migration_blocker) {
>
>
> Maybe also negate the if conditions and put the add/remove blocker code
> inside it? Is it more readable this way?
I think the previous aligns more with the success oriented flow that
Jason like to promote. Thanks,
Alex
> E.g.:
>
> +int vfio_block_multiple_devices_migration(Error **errp)
> +{
> + int ret = 0;
> +
> + if (vfio_migratable_device_num() > 1 &&
> + !multiple_devices_migration_blocker) {
> + error_setg(&multiple_devices_migration_blocker,
> + "Migration is currently not supported with multiple "
> + "VFIO devices");
> + ret = migrate_add_blocker(multiple_devices_migration_blocker,
> errp);
> + if (ret < 0) {
> + error_free(multiple_devices_migration_blocker);
> + multiple_devices_migration_blocker = NULL;
> + }
> + }
> +
> + return ret;
> +}
> +
> +void vfio_unblock_multiple_devices_migration(void)
> +{
> + if (vfio_migratable_device_num() <= 1 &&
> + multiple_devices_migration_blocker) {
> + migrate_del_blocker(multiple_devices_migration_blocker);
> + error_free(multiple_devices_migration_blocker);
> + multiple_devices_migration_blocker = NULL;
> + }
> +}
>
> Thanks.
>
- [PATCH v9 03/14] vfio/migration: Fix NULL pointer dereference bug, (continued)
- [PATCH v9 03/14] vfio/migration: Fix NULL pointer dereference bug, Avihai Horon, 2023/02/06
- [PATCH v9 05/14] migration/qemu-file: Add qemu_file_get_to_fd(), Avihai Horon, 2023/02/06
- [PATCH v9 04/14] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support, Avihai Horon, 2023/02/06
- [PATCH v9 06/14] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one, Avihai Horon, 2023/02/06
- [PATCH v9 08/14] vfio/migration: Move migration v1 logic to vfio_migration_init(), Avihai Horon, 2023/02/06
- [PATCH v9 07/14] vfio/migration: Block multiple devices migration, Avihai Horon, 2023/02/06
[PATCH v9 09/14] vfio/migration: Rename functions/structs related to v1 protocol, Avihai Horon, 2023/02/06
[PATCH v9 11/14] vfio/migration: Optimize vfio_save_pending(), Avihai Horon, 2023/02/06
[PATCH v9 10/14] vfio/migration: Implement VFIO migration protocol v2, Avihai Horon, 2023/02/06
[PATCH v9 12/14] vfio/migration: Remove VFIO migration protocol v1, Avihai Horon, 2023/02/06
[PATCH v9 13/14] vfio: Alphabetize migration section of VFIO trace-events file, Avihai Horon, 2023/02/06
[PATCH v9 14/14] docs/devel: Align VFIO migration docs to v2 protocol, Avihai Horon, 2023/02/06