[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: VFIO Migration
From: |
Stefan Hajnoczi |
Subject: |
Re: VFIO Migration |
Date: |
Tue, 3 Nov 2020 15:05:08 +0000 |
On Tue, Nov 03, 2020 at 11:39:29AM +0000, Daniel P. Berrangé wrote:
> On Mon, Nov 02, 2020 at 11:11:53AM +0000, Stefan Hajnoczi wrote:
> > Overview
> > --------
> > The purpose of device states is to save the device at a point in time and
> > then
> > restore the device back to the saved state later. This is more challenging
> > than
> > it first appears.
> >
> > The process of saving a device state and loading it later is called
> > *migration*. The state may be loaded by the same device that saved it or by
> > a
> > new instance of the device, possibly running on a different computer.
> >
> > It must be possible to migrate to a newer implementation of the device
> > as well as to an older implementation of the device. This allows users
> > to upgrade and roll back their systems.
> >
> > Migration can fail if loading the device state is not possible. It should
> > fail
> > early with a clear error message. It must not appear to complete but leave
> > the
> > device inoperable due to a migration problem.
>
> I think there needs to be an addition requirement.
>
> It must be possible for a management application to query the supported
> versions, independantly of execution of a migration operation.
>
> This is important to large scale data center / cloud management applications
> because before initiating a migration they need to *automatically* select
> a target host with high level of confidence that is will be compatible with
> the source host.
>
> Today QEMU migration compatibility is largely determined by the machine
> type version. Apps can query the supported machine types for host to
> check whether it is compatible. Similarly they will query CPU model
> features to check compatiblity.
>
> Validation and error checking at time of migration is of course still
> required, but the goal should be that an mgmt application will *NEVER*
> hit these errors because they will have pre-selected a host that is
> known to be compatible based on reported versions that are supported.
Okay. What do you think of the following?
[
{
"model": "https://qemu.org/devices/e1000e",
"params": [
"rss",
...more configuration parameters...
],
"versions": [
{
"name": "1",
"params": [],
},
{
"name": "2",
"params": ["rss=on"],
},
...more versions...
]
},
...more device models...
]
The management tool can generate the configuration parameter list by
expanding a version into its params.
Configuration parameter types and input ranges need more thought. For
example, version 1 of the device might not have rx-table-size (it's
effectively 0). Version 2 introduces rx-table-size and sets it to 32.
Version 3 raises the value to 64. In addition, the user can set a custom
value like rx-table-size=48. I haven't defined the rules for this yet,
but it's clear there needs to be a way to extend configuration
parameters.
To check migration compatibility:
1. Verify that the device model URL matches the JSON data[n].model
field.
2. For every configuration parameter name from the source device,
check that it is contained within the JSON data[n].params list.
> > VFIO Implementation
> > -------------------
> > The following applies both to kernel VFIO/mdev drivers and vfio-user device
> > backends.
> >
> > Devices are instantiated based on a version and/or configuration parameters:
> > * ``version=1`` - use the device configuration aliased by version 1
> > * ``version=2,rx-filter-size=64`` - use version 1 and override
> > ``rx-filter-size``
> > * ``rx-filter-size=0`` - directly set configuration parameters without
> > using a version
> >
> > Device creation fails if the version and/or configuration parameters are not
> > supported.
> >
> > There must be a mechanism to query the "latest" configuration for a device
> > model. It may simply report the ``version=5`` where 5 is the latest version
> > but
> > it could also report all configuration parameters instead of using a version
> > alias.
>
> The mechanism needs to be able to report all supported versions strings,
> not simple the latest version string. I think we need to specify the
> actual mechanism todo this query too, because we can't end up in a place
> where there's a different approach to queries for each device type.
Makes sense.
Stefan
signature.asc
Description: PGP signature
Re: VFIO Migration, Daniel P . Berrangé, 2020/11/03
- Re: VFIO Migration,
Stefan Hajnoczi <=
Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/03
- Re: VFIO Migration, Stefan Hajnoczi, 2020/11/03
- Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/03
- Re: VFIO Migration, Stefan Hajnoczi, 2020/11/04
- Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/04
- Re: VFIO Migration, Stefan Hajnoczi, 2020/11/04
- Re: VFIO Migration, Dr. David Alan Gilbert, 2020/11/04
- Re: VFIO Migration, Stefan Hajnoczi, 2020/11/05