qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VFIO Migration


From: Dr. David Alan Gilbert
Subject: Re: VFIO Migration
Date: Tue, 3 Nov 2020 18:49:51 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Nov 03, 2020 at 12:17:09PM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > Device Models
> > > -------------
> > > Devices have a *hardware interface* consisting of hardware registers,
> > > interrupts, and so on.
> > > 
> > > The hardware interface together with the device state representation is 
> > > called
> > > a *device model*. Device models can be assigned URIs such as
> > > https://qemu.org/devices/e1000e to uniquely identify them.
> > 
> > I think this is a unique identifier, not actually a URI; the https://
> > isn't needed since no one expects to ever connect to this.
> 
> Yes, it could be any unique string. If the URI idea is not popular we
> can use any similar scheme.

I'm OK with it being a URI; just drop the https.

> > > However, secondary aspects related to the physical port may affect the 
> > > device's
> > > hardware interface and need to be reflected in the device configuration. 
> > > The
> > > link speed may depend on the physical port and be reported through the 
> > > device's
> > > hardware interface. In that case a ``link-speed`` configuration parameter 
> > > is
> > > required to prevent unexpected changes to the link speed after migration.
> > 
> > That's an interesting example; because depending on the device, it might
> > be:
> >     a) Completely virtualised so that the guest *shouldn't* know what
> > the physical link speed is, precisely to allow the physical network on
> > the destination to be different.
> > 
> >     b) Part of the migrated state
> > 
> >     c) Something that's allowed to be reloaded after migration
> > 
> >     d) Configurable
> > 
> > so I'm not sure whether it's a good example in this case or not.
> 
> Can you think of an example that has only one option?
> 
> I tried but couldn't. For example take a sound card. The guest is aware
> the device supports stereo playback (2 output channels), but which exact
> stereo host device is used doesn't matter, they are all suitable.
> 
> Now imagine migrating to a 7.1 surround-sound device. Similar options
> come into play:
> 
> a) Emulate stereo and mix it to 7.1 surround-sound on the physical
>    device. The guest still sees the stereo device.
> 
> b) Refuse migration.
> 
> c) Indicate that the output has switched and let the guest reconfigure
>    itself (e.g. a sound card with multiple outputs, where one of them is
>    stereo and another is 7.1 surround sound).
> 
> Which option is desirable depends on the use case.

Yes, but I think it might be worth calling out these differences;  there
are explicitly cases where you don't want external changes to be visible
and other cases where you do; both are valid, but both need thinking
about. (Another one, GPU whether you have a monitor plugged in!)

> > Maybe what's needed is a stronger instruction to abstract external
> > device state so that it's not part of the configuration in most cases.
> 
> Do you want to propose something?

I think something like 'Some part of a devices state may be irrelevant
to a migration; for example on some NICs it might be preferable to hide
the physical characteristics of the link from the guest.'

> > > For example, if address filtering support was added to a network card then
> > > device versions and the corresponding configurations may look like this:
> > > * ``version=1`` - Behaves as if ``rx-filter-size=0``
> > > * ``version=2`` - ``rx-filter-size=32``
> > 
> > Note configuration parameters might have been added during the life of
> > the device; e.g. if the original card had no support for rx-filters, it
> > might not have a rx-filter-size parameter.
> 
> version=1 does not explicitly set rx-filter-size=0. When a new parameter
> is introduced it must have a default value that disables its effect on
> the hardware interface and/or device state representation. This is
> described in a bit more detail in the next section, maybe it should be
> reordered.

We've generally found the definition of devices tends in practice to be
done newer->older; i.e. you define the current machine, and then define
the next older machine setting the defaults that used to be true; then
define the older version behind that....

> > > Device States
> > > -------------
> > > The details of the device state representation are not covered in this 
> > > document
> > > but the general requirements are discussed here.
> > > 
> > > The device state consists of data accessible through the device's hardware
> > > interface and internal state that is needed to restore device operation.
> > > State in the hardware interface includes the values of hardware registers.
> > > An example of internal state is an index value needed to avoid processing
> > > queued requests more than once.
> > 
> > I try and emphasise that 'internal state' should be represented in a way
> > that reflects the problem rather than the particular implementation;
> > this gives it a better chance of migrating to future versions.
> 
> Sounds like a good idea.
> 
> > > Changes can be made to the device state representation as follows. Each 
> > > change
> > > to device state must have a corresponding device configuration parameter 
> > > that
> > > allows the change to toggled:
> > > 
> > > * When the parameter is disabled the hardware interface and device state
> > >   representation are unchanged. This allows old device states to be 
> > > loaded.
> > > 
> > > * When the parameter is enabled the change comes into effect.
> > > 
> > > * The parameter's default value disables the change. Therefore old 
> > > versions do
> > >   not have to explicitly specify the parameter.
> > > 
> > > The following example illustrates migration from an old device
> > > implementation to a new one. A version=1 network card is migrated to a
> > > new device implementation that is also capable of version=2 and adds the
> > > rx-filter-size=32 parameter. The new device is instantiated with
> > > version=1, which disables rx-filter-size and is capable of loading the
> > > version=1 device state. The migration completes successfully but note
> > > the device is still operating at version=1 level in the new device.
> > > 
> > > The following example illustrates migration from a new device
> > > implementation back to an older one. The new device implementation
> > > supports version=1 and version=2. The old device implementation supports
> > > version=1 only. Therefore the device can only be migrated when
> > > instantiated with version=1 or the equivalent full configuration
> > > parameters.
> > 
> > I'm sometimes asked for 'ways out' of buggy migration cases; e.g. what
> > happens if version=1 forgot to migrate the X register; or what happens
> > if verison=1 forgot to handle the special, rare case when X=5 and we
> > now need to migrate some extra state.
> 
> Can these cases be handled by adding additional configuration parameters?
> 
> If version=1 is lacks essential state then version=2 can add it. The
> user must configure the device to use version before they can save the
> full state.
> 
> If version=1 didn't handle the X=5 case then the same solution is
> needed. A new configuration parameter is introduced and the user needs
> to configure the device to be the new version before migrating.
> 
> Unfortunately this requires poweroff or hotplugging a new device
> instance. But some disruption is probably necessarily anyway so the
> migration code on the host side can be patched to use the updated device
> state representation.

There are some corner cases that people sometimes prefer; for example
lets say the X=5 case is actually really rare - but when it happens the
device is hopelessly broken, some device authors prefer to fix it and
send the extra data and let the migration fail if the destination
doesn't understand it (it would break anyway).  I've also been asked
by mst for a 'unexpected data' mechanism to send data that the
destination might not expect if it didn't know about it, for similar
cases.

> > > Orchestrating Migrations
> > > ------------------------
> > > The following steps must be followed to migrate devices:
> > > 
> > > 1. Check that the source and destination devices support the same device 
> > > model.
> > > 
> > > 2. Check that the destination device supports the source device's
> > >    configuration. Each configuration parameter must be accepted by the
> > >    destination in order to ensure that it will be possible to load the 
> > > device
> > >    state.
> > 
> > This is written in terms of a 'check'; there are at least three tricky
> > things:
> > 
> >   a) Where they both have the same parameter, do they accept the same
> > range of values; e.g. a newer version of the card might allow
> > rx-filter-size to go upto 128
> 
> The easy way to handle that without lots of metadata is by instantiating
> the destination device to see if it works.
> 
> But in the next point you mention cloud where we need a way to find a
> host that supports a given device. Metadata is probably needed to make
> that check easy. In the email reply to Daniel Berrange I posted the
> beginning of a JSON schema that describes device models for this
> purpose. I think that offers a solution for the cloud case.

A similar suggestion had come up in the vfio thread with Nvidia
some months ago; I can't remember the outcome of that.
(Much of this thread repeats the repeated long discussions on that
thread!)

Dave

> 
> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]