[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VFIO Migration
From: |
Daniel P . Berrangé |
Subject: |
Re: VFIO Migration |
Date: |
Tue, 3 Nov 2020 15:23:03 +0000 |
User-agent: |
Mutt/1.14.6 (2020-07-11) |
On Tue, Nov 03, 2020 at 03:05:08PM +0000, Stefan Hajnoczi wrote:
> On Tue, Nov 03, 2020 at 11:39:29AM +0000, Daniel P. Berrangé wrote:
> > On Mon, Nov 02, 2020 at 11:11:53AM +0000, Stefan Hajnoczi wrote:
> > > Overview
> > > --------
> > > The purpose of device states is to save the device at a point in time and
> > > then
> > > restore the device back to the saved state later. This is more
> > > challenging than
> > > it first appears.
> > >
> > > The process of saving a device state and loading it later is called
> > > *migration*. The state may be loaded by the same device that saved it or
> > > by a
> > > new instance of the device, possibly running on a different computer.
> > >
> > > It must be possible to migrate to a newer implementation of the device
> > > as well as to an older implementation of the device. This allows users
> > > to upgrade and roll back their systems.
> > >
> > > Migration can fail if loading the device state is not possible. It should
> > > fail
> > > early with a clear error message. It must not appear to complete but
> > > leave the
> > > device inoperable due to a migration problem.
> >
> > I think there needs to be an addition requirement.
> >
> > It must be possible for a management application to query the supported
> > versions, independantly of execution of a migration operation.
> >
> > This is important to large scale data center / cloud management applications
> > because before initiating a migration they need to *automatically* select
> > a target host with high level of confidence that is will be compatible with
> > the source host.
> >
> > Today QEMU migration compatibility is largely determined by the machine
> > type version. Apps can query the supported machine types for host to
> > check whether it is compatible. Similarly they will query CPU model
> > features to check compatiblity.
> >
> > Validation and error checking at time of migration is of course still
> > required, but the goal should be that an mgmt application will *NEVER*
> > hit these errors because they will have pre-selected a host that is
> > known to be compatible based on reported versions that are supported.
>
> Okay. What do you think of the following?
>
> [
> {
> "model": "https://qemu.org/devices/e1000e",
> "params": [
> "rss",
> ...more configuration parameters...
> ],
> "versions": [
> {
> "name": "1",
> "params": [],
> },
> {
> "name": "2",
> "params": ["rss=on"],
> },
> ...more versions...
> ]
> },
> ...more device models...
> ]
>
> The management tool can generate the configuration parameter list by
> expanding a version into its params.
>
> Configuration parameter types and input ranges need more thought. For
> example, version 1 of the device might not have rx-table-size (it's
> effectively 0). Version 2 introduces rx-table-size and sets it to 32.
> Version 3 raises the value to 64. In addition, the user can set a custom
> value like rx-table-size=48. I haven't defined the rules for this yet,
> but it's clear there needs to be a way to extend configuration
> parameters.
>
> To check migration compatibility:
> 1. Verify that the device model URL matches the JSON data[n].model
> field.
> 2. For every configuration parameter name from the source device,
> check that it is contained within the JSON data[n].params list.
I'm not convinced that this makes sense. A matching set of parameter
names + values does not imply that the migration data stream is
actually compatible.
ie implementations may need to change the internal migration data
stream to fix bugs, without adding/removing a config parameter.
The migration version string alone expresses data stream compatibility.
This is similar to how 2 QEMU command lines can have identical set
of configuration parameters, aside from the machine type version,
and thus be migration *incompatible.
Basically the version string should be considered an opaque blob
that expresses compatibility on its own.
> > > VFIO Implementation
> > > -------------------
> > > The following applies both to kernel VFIO/mdev drivers and vfio-user
> > > device
> > > backends.
> > >
> > > Devices are instantiated based on a version and/or configuration
> > > parameters:
> > > * ``version=1`` - use the device configuration aliased by version 1
> > > * ``version=2,rx-filter-size=64`` - use version 1 and override
> > > ``rx-filter-size``
> > > * ``rx-filter-size=0`` - directly set configuration parameters without
> > > using a version
> > >
> > > Device creation fails if the version and/or configuration parameters are
> > > not
> > > supported.
> > >
> > > There must be a mechanism to query the "latest" configuration for a device
> > > model. It may simply report the ``version=5`` where 5 is the latest
> > > version but
> > > it could also report all configuration parameters instead of using a
> > > version
> > > alias.
> >
> > The mechanism needs to be able to report all supported versions strings,
> > not simple the latest version string. I think we need to specify the
> > actual mechanism todo this query too, because we can't end up in a place
> > where there's a different approach to queries for each device type.
>
> Makes sense.
>
> Stefan
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- Re: VFIO Migration, (continued)
Re: VFIO Migration, Jason Wang, 2020/11/03
Re: VFIO Migration, Daniel P . Berrangé, 2020/11/03
Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/03
- Re: VFIO Migration, Stefan Hajnoczi, 2020/11/03
- Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/03
- Re: VFIO Migration, Stefan Hajnoczi, 2020/11/04
- Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/04
- Re: VFIO Migration, Stefan Hajnoczi, 2020/11/04
- Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/04
- Re: VFIO Migration, Stefan Hajnoczi, 2020/11/05
- Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/05