qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VFIO Migration


From: Daniel P . Berrangé
Subject: Re: VFIO Migration
Date: Tue, 3 Nov 2020 15:23:03 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

On Tue, Nov 03, 2020 at 03:05:08PM +0000, Stefan Hajnoczi wrote:
> On Tue, Nov 03, 2020 at 11:39:29AM +0000, Daniel P. Berrangé wrote:
> > On Mon, Nov 02, 2020 at 11:11:53AM +0000, Stefan Hajnoczi wrote:
> > > Overview
> > > --------
> > > The purpose of device states is to save the device at a point in time and 
> > > then
> > > restore the device back to the saved state later. This is more 
> > > challenging than
> > > it first appears.
> > > 
> > > The process of saving a device state and loading it later is called
> > > *migration*. The state may be loaded by the same device that saved it or 
> > > by a
> > > new instance of the device, possibly running on a different computer.
> > > 
> > > It must be possible to migrate to a newer implementation of the device
> > > as well as to an older implementation of the device. This allows users
> > > to upgrade and roll back their systems.
> > > 
> > > Migration can fail if loading the device state is not possible. It should 
> > > fail
> > > early with a clear error message. It must not appear to complete but 
> > > leave the
> > > device inoperable due to a migration problem.
> > 
> > I think there needs to be an addition requirement.
> > 
> >  It must be possible for a management application to query the supported
> >  versions, independantly of execution of a migration  operation.
> > 
> > This is important to large scale data center / cloud management applications
> > because before initiating a migration they need to *automatically* select
> > a target host with high level of confidence that is will be compatible with
> > the source host.
> > 
> > Today QEMU migration compatibility is largely determined by the machine
> > type version. Apps can query the supported machine types for host to
> > check whether it is compatible. Similarly they will query CPU model
> > features to check compatiblity.
> > 
> > Validation and error checking at time of migration is of course still
> > required, but the goal should be that an mgmt application will *NEVER*
> > hit these errors because they will have pre-selected a host that is
> > known to be compatible based on reported versions that are supported.
> 
> Okay. What do you think of the following?
> 
>   [
>     {
>       "model": "https://qemu.org/devices/e1000e";,
>       "params": [
>         "rss",
>       ...more configuration parameters...
>       ],
>       "versions": [
>         {
>         "name": "1",
>         "params": [],
>       },
>       {
>         "name": "2",
>         "params": ["rss=on"],
>       },
>       ...more versions...
>       ]
>     },
>     ...more device models...
>   ]
> 
> The management tool can generate the configuration parameter list by
> expanding a version into its params.
> 
> Configuration parameter types and input ranges need more thought. For
> example, version 1 of the device might not have rx-table-size (it's
> effectively 0). Version 2 introduces rx-table-size and sets it to 32.
> Version 3 raises the value to 64. In addition, the user can set a custom
> value like rx-table-size=48. I haven't defined the rules for this yet,
> but it's clear there needs to be a way to extend configuration
> parameters.
> 
> To check migration compatibility:
> 1. Verify that the device model URL matches the JSON data[n].model
>    field.
> 2. For every configuration parameter name from the source device,
>    check that it is contained within the JSON data[n].params list.

I'm not convinced that this makes sense. A matching set of parameter
names + values does not imply that the migration data stream is
actually compatible.

ie implementations may need to change the internal migration data
stream to fix bugs, without adding/removing a config parameter.
The migration version string alone expresses data stream compatibility.

This is similar to how 2 QEMU command lines can have identical set
of configuration parameters, aside from the machine type version,
and thus be migration *incompatible.

Basically the version string should be considered an opaque blob
that expresses compatibility on its own.

> > > VFIO Implementation
> > > -------------------
> > > The following applies both to kernel VFIO/mdev drivers and vfio-user 
> > > device
> > > backends.
> > > 
> > > Devices are instantiated based on a version and/or configuration 
> > > parameters:
> > > * ``version=1`` - use the device configuration aliased by version 1
> > > * ``version=2,rx-filter-size=64`` - use version 1 and override 
> > > ``rx-filter-size``
> > > * ``rx-filter-size=0`` - directly set configuration parameters without 
> > > using a version
> > > 
> > > Device creation fails if the version and/or configuration parameters are 
> > > not
> > > supported.
> > > 
> > > There must be a mechanism to query the "latest" configuration for a device
> > > model. It may simply report the ``version=5`` where 5 is the latest 
> > > version but
> > > it could also report all configuration parameters instead of using a 
> > > version
> > > alias.
> > 
> > The mechanism needs to be able to report all supported versions strings,
> > not simple the latest version string. I think we need to specify the
> > actual mechanism todo this query too, because we can't end up in a place
> > where there's a different approach to queries for each device type.
> 
> Makes sense.
> 
> Stefan



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]