qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: VFIO Migration


From: Stefan Hajnoczi
Subject: Re: VFIO Migration
Date: Mon, 2 Nov 2020 14:56:26 +0000

On Mon, Nov 02, 2020 at 01:28:44PM +0100, Cornelia Huck wrote:
> On Mon, 2 Nov 2020 11:11:53 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > VFIO Migration
> > ==============
> > This document describes how to save and load VFIO device states. Saving a
> > device state produces a snapshot of a VFIO device's state that can be loaded
> > again at a later point in time to resume the device from the snapshot.
> > 
> > The data representation of the device state is outside the scope of this
> > document.
> 
> [Is this document supposed to live in the QEMU source tree later?]

It should live alongside the VFIO documentation. For vfio-user the spec
will live in qemu.git and we could also keep this document there. The
kernel VFIO/mdev drivers also need this information. They could link to
the QEMU document.

> > Device Models
> > -------------
> > Devices have a *hardware interface* consisting of hardware registers,
> > interrupts, and so on.
> > 
> > The hardware interface together with the device state representation is 
> > called
> > a *device model*. Device models can be assigned URIs such as
> > https://qemu.org/devices/e1000e to uniquely identify them.
> 
> Is that something that needs to be put together for every device where we
> want to support migration? How do you create the URI?

Yes. If you are creating a custom device that no one else needs to
emulate then you can simply pick a unique URL:

  https://vendor.com/my-dev

There doesn't need to be anything at the URL. It's just a unique string
that no one else will use and therefore web URLs are handy because no
one else will accidentally pick your string.

If your intention is to define a standard device model that others can
emulate and migrate, then it's good practice to publish a web page about
the device model at the URL, including the hardware datasheet and the
device state representation (e.g. a spec describing the migration data
stream).

For example, https://virtio-spec.org/devices/pci/virtio-net would
contain a link to the VIRTIO specification and the device state
representation. This allows others to implement devices that are
compatible and support migration between implementations. This is
getting beyond the scope of this document, but I imagine the VIRTIO
device state representation would be QEMU's current vmstate
representation so that migration between QEMU and out-of-process devices
is possible...

> For mdev devices, would this refer to the "base" device, or to the
> device specified by a certain mdev type?

The device synthesized by the mdev driver, because that's the
guest-visible hardware interface. I didn't want to say "guest-visible"
or refer to VMs in the document, but maybe that would make things
clearer.

> > Note that the device configuration is a conservative bound on device
> > states that can be migrated successfully since not all configuration
> > parameters may be strictly required to match on the source and
> > destination devices. For example, if the device's hardware interface has
> > not yet been initialized then changes to the link speed may not be
> > noticed. However, accurately representing runtime constraints is complex
> > and risks introducing migration bugs, so no attempt is made to support
> > them to achieve more relaxed bounds on successful migrations.
> 
> Do we want a "I know what I'm doing" override?

I think that could be implementation-defined. Maybe it will be useful if
a device is very broken and you want to offer users a command-line that
allows them to migrate to safety.

> > Device Versions
> > ---------------
> > As a device evolves, the number of configuration parameters required may 
> > become
> > inconvenient for users to express in full. A device configuration can be
> > aliased by a *device version*, which is a shorthand for the full device
> > configuration. This makes it easy to apply a standard device configuration
> > without listing every configuration parameter explicitly.
> > 
> > For example, if address filtering support was added to a network card then
> > device versions and the corresponding configurations may look like this:
> > * ``version=1`` - Behaves as if ``rx-filter-size=0``
> > * ``version=2`` - ``rx-filter-size=32``
> 
> Is versioning supposed to be an ascending number, with a migration from
> n->n+m possible, but not the other way around?

The actual version string does not matter. Ascending integers is a
reasonable convention because it's easy to type and for humans to
compare.

Slightly pedantic but important point: migration from n->m is not
possible according to this document. It's always migration from n->n.
Migrating does not upgrade the guest-visible aspects of the device. The
device instance always remains at its current version. Of course the
destination device implementation may contain bug fixes, etc that come
into effect right away, but they don't change the guest-visible hardware
interface or device state representation.

To actually upgrade from n->m the user must explicitly reconfigure the
guest and hotplug or reboot.

In other words, the device version (always stays the same throughout the
lifetime of a device instance) and the device implementation version
(e.g. my-virtio-net-pci-v1.1) are two different concepts.

> Are these device versions supposed to be independent of machine versions?

Yes. Since VFIO devices are passthrough devices that can be implemented
without introducing code into QEMU, they are separate from versioned
machine types. This is similar to going out and buying a PCI adapter and
putting it into a machine. The machine itself may be a Dell Foo Bar
server with a certain hardware spec, but the PCI adapter is a completely
separate device with no relation to the machine type.

> > 
> > Device States
> > -------------
> > The details of the device state representation are not covered in this 
> > document
> > but the general requirements are discussed here.
> > 
> > The device state consists of data accessible through the device's hardware
> > interface and internal state that is needed to restore device operation.
> > State in the hardware interface includes the values of hardware registers.
> > An example of internal state is an index value needed to avoid processing
> > queued requests more than once.
> > 
> > Changes can be made to the device state representation as follows. Each 
> > change
> > to device state must have a corresponding device configuration parameter 
> > that
> > allows the change to toggled:
> 
> s/to/to be/ :)

To be or not to be! Thanks.

> > VFIO Implementation
> > -------------------
> > The following applies both to kernel VFIO/mdev drivers and vfio-user device
> > backends.
> > 
> > Devices are instantiated based on a version and/or configuration parameters:
> > * ``version=1`` - use the device configuration aliased by version 1
> > * ``version=2,rx-filter-size=64`` - use version 1 and override 
> > ``rx-filter-size``
> > * ``rx-filter-size=0`` - directly set configuration parameters without 
> > using a version
> 
> I think some of this would be encapsulated in the mdev type for
> mediated devices.

Yes, the device model and configuration need to be provided when
creating the mdev instance. This assumption is built into this design:

You decide the device model and configuration at creation time, not at
migration time. In other words, each device instance is fully specified
at all times and there is no choice of "in which format should we
save/load this?".

This approach is simple and easy to troubleshoot, but if someone can
think of a reason why it's too limited, please share.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]