qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VFIO Migration


From: Stefan Hajnoczi
Subject: Re: VFIO Migration
Date: Wed, 4 Nov 2020 07:36:36 +0000

On Tue, Nov 03, 2020 at 06:49:51PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Tue, Nov 03, 2020 at 12:17:09PM +0000, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > Device Models
> > > > -------------
> > > > Devices have a *hardware interface* consisting of hardware registers,
> > > > interrupts, and so on.
> > > > 
> > > > The hardware interface together with the device state representation is 
> > > > called
> > > > a *device model*. Device models can be assigned URIs such as
> > > > https://qemu.org/devices/e1000e to uniquely identify them.
> > > 
> > > I think this is a unique identifier, not actually a URI; the https://
> > > isn't needed since no one expects to ever connect to this.
> > 
> > Yes, it could be any unique string. If the URI idea is not popular we
> > can use any similar scheme.
> 
> I'm OK with it being a URI; just drop the https.

Okay.

> > > > However, secondary aspects related to the physical port may affect the 
> > > > device's
> > > > hardware interface and need to be reflected in the device 
> > > > configuration. The
> > > > link speed may depend on the physical port and be reported through the 
> > > > device's
> > > > hardware interface. In that case a ``link-speed`` configuration 
> > > > parameter is
> > > > required to prevent unexpected changes to the link speed after 
> > > > migration.
> > > 
> > > That's an interesting example; because depending on the device, it might
> > > be:
> > >     a) Completely virtualised so that the guest *shouldn't* know what
> > > the physical link speed is, precisely to allow the physical network on
> > > the destination to be different.
> > > 
> > >     b) Part of the migrated state
> > > 
> > >     c) Something that's allowed to be reloaded after migration
> > > 
> > >     d) Configurable
> > > 
> > > so I'm not sure whether it's a good example in this case or not.
> > 
> > Can you think of an example that has only one option?
> > 
> > I tried but couldn't. For example take a sound card. The guest is aware
> > the device supports stereo playback (2 output channels), but which exact
> > stereo host device is used doesn't matter, they are all suitable.
> > 
> > Now imagine migrating to a 7.1 surround-sound device. Similar options
> > come into play:
> > 
> > a) Emulate stereo and mix it to 7.1 surround-sound on the physical
> >    device. The guest still sees the stereo device.
> > 
> > b) Refuse migration.
> > 
> > c) Indicate that the output has switched and let the guest reconfigure
> >    itself (e.g. a sound card with multiple outputs, where one of them is
> >    stereo and another is 7.1 surround sound).
> > 
> > Which option is desirable depends on the use case.
> 
> Yes, but I think it might be worth calling out these differences;  there
> are explicitly cases where you don't want external changes to be visible
> and other cases where you do; both are valid, but both need thinking
> about. (Another one, GPU whether you have a monitor plugged in!)

Okay.

> > > Maybe what's needed is a stronger instruction to abstract external
> > > device state so that it's not part of the configuration in most cases.
> > 
> > Do you want to propose something?
> 
> I think something like 'Some part of a devices state may be irrelevant
> to a migration; for example on some NICs it might be preferable to hide
> the physical characteristics of the link from the guest.'

Got it.

> > > > For example, if address filtering support was added to a network card 
> > > > then
> > > > device versions and the corresponding configurations may look like this:
> > > > * ``version=1`` - Behaves as if ``rx-filter-size=0``
> > > > * ``version=2`` - ``rx-filter-size=32``
> > > 
> > > Note configuration parameters might have been added during the life of
> > > the device; e.g. if the original card had no support for rx-filters, it
> > > might not have a rx-filter-size parameter.
> > 
> > version=1 does not explicitly set rx-filter-size=0. When a new parameter
> > is introduced it must have a default value that disables its effect on
> > the hardware interface and/or device state representation. This is
> > described in a bit more detail in the next section, maybe it should be
> > reordered.
> 
> We've generally found the definition of devices tends in practice to be
> done newer->older; i.e. you define the current machine, and then define
> the next older machine setting the defaults that used to be true; then
> define the older version behind that....

That is not possible here because an older device implementation is
unaware of new configuration parameters.

Looking at the example above, imagine a version=1 device is instantiated
on a device implementation that supports both version=1 and version=2.
Should the configuration parameter list for version=1 be empty or
rx-filter-size=0?

It must to be empty, otherwise an older device implementation that only
supports version=1 cannot instantiate the device. The older device
implementation does not recognize the rx-filter-size configuration
parameter (it was introduced in version=2) so we cannot set it to 0.

> > > > Device States
> > > > -------------
> > > > The details of the device state representation are not covered in this 
> > > > document
> > > > but the general requirements are discussed here.
> > > > 
> > > > The device state consists of data accessible through the device's 
> > > > hardware
> > > > interface and internal state that is needed to restore device operation.
> > > > State in the hardware interface includes the values of hardware 
> > > > registers.
> > > > An example of internal state is an index value needed to avoid 
> > > > processing
> > > > queued requests more than once.
> > > 
> > > I try and emphasise that 'internal state' should be represented in a way
> > > that reflects the problem rather than the particular implementation;
> > > this gives it a better chance of migrating to future versions.
> > 
> > Sounds like a good idea.
> > 
> > > > Changes can be made to the device state representation as follows. Each 
> > > > change
> > > > to device state must have a corresponding device configuration 
> > > > parameter that
> > > > allows the change to toggled:
> > > > 
> > > > * When the parameter is disabled the hardware interface and device state
> > > >   representation are unchanged. This allows old device states to be 
> > > > loaded.
> > > > 
> > > > * When the parameter is enabled the change comes into effect.
> > > > 
> > > > * The parameter's default value disables the change. Therefore old 
> > > > versions do
> > > >   not have to explicitly specify the parameter.
> > > > 
> > > > The following example illustrates migration from an old device
> > > > implementation to a new one. A version=1 network card is migrated to a
> > > > new device implementation that is also capable of version=2 and adds the
> > > > rx-filter-size=32 parameter. The new device is instantiated with
> > > > version=1, which disables rx-filter-size and is capable of loading the
> > > > version=1 device state. The migration completes successfully but note
> > > > the device is still operating at version=1 level in the new device.
> > > > 
> > > > The following example illustrates migration from a new device
> > > > implementation back to an older one. The new device implementation
> > > > supports version=1 and version=2. The old device implementation supports
> > > > version=1 only. Therefore the device can only be migrated when
> > > > instantiated with version=1 or the equivalent full configuration
> > > > parameters.
> > > 
> > > I'm sometimes asked for 'ways out' of buggy migration cases; e.g. what
> > > happens if version=1 forgot to migrate the X register; or what happens
> > > if verison=1 forgot to handle the special, rare case when X=5 and we
> > > now need to migrate some extra state.
> > 
> > Can these cases be handled by adding additional configuration parameters?
> > 
> > If version=1 is lacks essential state then version=2 can add it. The
> > user must configure the device to use version before they can save the
> > full state.
> > 
> > If version=1 didn't handle the X=5 case then the same solution is
> > needed. A new configuration parameter is introduced and the user needs
> > to configure the device to be the new version before migrating.
> > 
> > Unfortunately this requires poweroff or hotplugging a new device
> > instance. But some disruption is probably necessarily anyway so the
> > migration code on the host side can be patched to use the updated device
> > state representation.
> 
> There are some corner cases that people sometimes prefer; for example
> lets say the X=5 case is actually really rare - but when it happens the
> device is hopelessly broken, some device authors prefer to fix it and
> send the extra data and let the migration fail if the destination
> doesn't understand it (it would break anyway).

The device implementation needs to be updated to send the extra data. At
that point a new device configuration parameter should be introduced and
if the user wishes to run the new version of the device then the extra
data will be sent.

If the destination doesn't support the new parameter then migration will
be refused. That matches what you've described, so I think the approach
in this document handles this case.

> I've also been asked
> by mst for a 'unexpected data' mechanism to send data that the
> destination might not expect if it didn't know about it, for similar
> cases.

Do you mean optional data that can be more or less safely dropped? A new
device configuration parameter is not needed because the hardware
interface and device state representation remain compatible. That
feature can be defined in the device state representation spec and is
not visible at the layer discussed in this document. But I think it's
worth adding an explanation into this document explaining what to do.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]