qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration


From: Alex Williamson
Subject: Re: [PATCH v9 07/14] vfio/migration: Block multiple devices migration
Date: Wed, 8 Feb 2023 10:22:42 -0700

On Wed, 8 Feb 2023 15:08:15 +0200
Avihai Horon <avihaih@nvidia.com> wrote:

> On 08/02/2023 0:34, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, 6 Feb 2023 14:31:30 +0200
> > Avihai Horon <avihaih@nvidia.com> wrote:
> >  
> >> Currently VFIO migration doesn't implement some kind of intermediate
> >> quiescent state in which P2P DMAs are quiesced before stopping or
> >> running the device. This can cause problems in multi-device migration
> >> where the devices are doing P2P DMAs, since the devices are not stopped
> >> together at the same time.
> >>
> >> Until such support is added, block migration of multiple devices.
> >>
> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >> ---
> >>   include/hw/vfio/vfio-common.h |  2 ++
> >>   hw/vfio/common.c              | 51 +++++++++++++++++++++++++++++++++++
> >>   hw/vfio/migration.c           |  6 +++++
> >>   3 files changed, 59 insertions(+)
> >>
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index e573f5a9f1..56b1683824 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -218,6 +218,8 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) 
> >> VFIOGroupList;
> >>   extern VFIOGroupList vfio_group_list;
> >>
> >>   bool vfio_mig_active(void);
> >> +int vfio_block_multiple_devices_migration(Error **errp);
> >> +void vfio_unblock_multiple_devices_migration(void);
> >>   int64_t vfio_mig_bytes_transferred(void);
> >>
> >>   #ifdef CONFIG_LINUX
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 3a35f4afad..01db41b735 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -41,6 +41,7 @@
> >>   #include "qapi/error.h"
> >>   #include "migration/migration.h"
> >>   #include "migration/misc.h"
> >> +#include "migration/blocker.h"
> >>   #include "sysemu/tpm.h"
> >>
> >>   VFIOGroupList vfio_group_list =
> >> @@ -337,6 +338,56 @@ bool vfio_mig_active(void)
> >>       return true;
> >>   }
> >>
> >> +Error *multiple_devices_migration_blocker;
> >> +
> >> +static unsigned int vfio_migratable_device_num(void)
> >> +{
> >> +    VFIOGroup *group;
> >> +    VFIODevice *vbasedev;
> >> +    unsigned int device_num = 0;
> >> +
> >> +    QLIST_FOREACH(group, &vfio_group_list, next) {
> >> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> >> +            if (vbasedev->migration) {
> >> +                device_num++;
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +    return device_num;
> >> +}
> >> +
> >> +int vfio_block_multiple_devices_migration(Error **errp)
> >> +{
> >> +    int ret;
> >> +
> >> +    if (vfio_migratable_device_num() != 2) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    error_setg(&multiple_devices_migration_blocker,
> >> +               "Migration is currently not supported with multiple "
> >> +               "VFIO devices");
> >> +    ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
> >> +    if (ret < 0) {
> >> +        error_free(multiple_devices_migration_blocker);
> >> +        multiple_devices_migration_blocker = NULL;
> >> +    }
> >> +
> >> +    return ret;
> >> +}
> >> +
> >> +void vfio_unblock_multiple_devices_migration(void)
> >> +{
> >> +    if (vfio_migratable_device_num() != 2) {
> >> +        return;
> >> +    }
> >> +
> >> +    migrate_del_blocker(multiple_devices_migration_blocker);
> >> +    error_free(multiple_devices_migration_blocker);
> >> +    multiple_devices_migration_blocker = NULL;
> >> +}  
> > A couple awkward things here.  First I wish we could do something
> > cleaner or more intuitive than the != 2 test.  I get that we're trying
> > to do this on the addition of the 2nd device supporting migration, or
> > the removal of the next to last device independent of all other devices,
> > but I wonder if it wouldn't be better to remove the multiple-device
> > blocker after migration is torn down for the device so we can test
> > device >1 or ==1 in combination with whether
> > multiple_devices_migration_blocker is NULL.
> >
> > Which comes to the second awkwardness, if we fail to add the blocker we
> > free and clear the blocker, but when we tear down the device due to that
> > failure we'll remove the blocker that doesn't exist, free NULL, and
> > clear it again.  Thanks to the glib slist the migration blocker is
> > using, I think that all works, but I'd rather not be dependent on that
> > implementation to avoid a segfault here.  Incorporating a test of
> > multiple_devices_migration_blocker as above would avoid this too.  
> 
> You mean something like this?
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 3a35f4afad..f3e08eff58 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> 
> [...]
> 
> +int vfio_block_multiple_devices_migration(Error **errp)
> +{
> +    int ret;
> +
> +    if (vfio_migratable_device_num() <= 1 ||
> +        multiple_devices_migration_blocker) {
> +        return 0;
> +    }

Nit, I'd reverse the order of the test here and below, otherwise yes,
this is what I was thinking of.

> +
> +    error_setg(&multiple_devices_migration_blocker,
> +               "Migration is currently not supported with multiple "
> +               "VFIO devices");
> +    ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
> +    if (ret < 0) {
> +        error_free(multiple_devices_migration_blocker);
> +        multiple_devices_migration_blocker = NULL;
> +    }
> +
> +    return ret;
> +}
> +
> +void vfio_unblock_multiple_devices_migration(void)
> +{
> +    if (vfio_migratable_device_num() > 1 ||
> +        !multiple_devices_migration_blocker) {
> +        return;
> +    }
> +
> +    migrate_del_blocker(multiple_devices_migration_blocker);
> +    error_free(multiple_devices_migration_blocker);
> +    multiple_devices_migration_blocker = NULL;
> +}
> +
>   static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>   {
>       VFIOGroup *group;
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 552c2313b2..15b446c0ec 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -880,6 +880,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, 
> Error **errp)
>           goto add_blocker;
>       }
> 
> +    ret = vfio_block_multiple_devices_migration(errp);
> +    if (ret) {
> +        return ret;
> +    }
> +
>       trace_vfio_migration_probe(vbasedev->name, info->index);
>       g_free(info);
>       return 0;
> @@ -906,6 +911,7 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
>           qemu_del_vm_change_state_handler(migration->vm_state);
>           unregister_savevm(VMSTATE_IF(vbasedev->dev), "vfio", vbasedev);
>           vfio_migration_exit(vbasedev);
> +        vfio_unblock_multiple_devices_migration();
>       }
> 
>       if (vbasedev->migration_blocker) {
> 
> 
> Maybe also negate the if conditions and put the add/remove blocker code 
> inside it? Is it more readable this way?

I think the previous aligns more with the success oriented flow that
Jason like to promote.  Thanks,

Alex

> E.g.:
> 
> +int vfio_block_multiple_devices_migration(Error **errp)
> +{
> +    int ret = 0;
> +
> +    if (vfio_migratable_device_num() > 1 &&
> +        !multiple_devices_migration_blocker) {
> +        error_setg(&multiple_devices_migration_blocker,
> +                   "Migration is currently not supported with multiple "
> +                   "VFIO devices");
> +        ret = migrate_add_blocker(multiple_devices_migration_blocker, 
> errp);
> +        if (ret < 0) {
> +            error_free(multiple_devices_migration_blocker);
> +            multiple_devices_migration_blocker = NULL;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +void vfio_unblock_multiple_devices_migration(void)
> +{
> +    if (vfio_migratable_device_num() <= 1 &&
> +        multiple_devices_migration_blocker) {
> +        migrate_del_blocker(multiple_devices_migration_blocker);
> +        error_free(multiple_devices_migration_blocker);
> +        multiple_devices_migration_blocker = NULL;
> +    }
> +}
> 
> Thanks.
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]