[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC v2] VFIO Migration
Re: [RFC v2] VFIO Migration
Tue, 10 Nov 2020 09:52:17 +0000
On Thu, Nov 05, 2020 at 12:37:08PM -0700, Alex Williamson wrote:
> On Thu, 5 Nov 2020 15:09:02 +0000
> Stefan Hajnoczi <email@example.com> wrote:
> > The disk image file may indirectly affect the hardware interface, for
> > example
> > by constraining the device's block size. In this case a block-size=N
> > migration
> > parameter is required to ensure migration compatibility, but the host file
> > system path of the disk image file still does not require a migration
> > parameter.
> I'm not sure what the above section defined. We refer to these as
> migration parameters, just as in the previous section, but are they
> read-only and must match exactly?
I will try to clarify this in the next revision. In this example
block-size=N is determined by the properties of the physical block
device. The device can only be migrated to a destination with the same
block size. The block-size=N migration parameter expresses this
> > Device State Representation
> > ---------------------------
> > Device state contains both data accessible through the device's hardware
> > interface and device-internal state needed to restore device operation.
> > The contents of hardware registers are usually included in the device state
> > if
> > they can change at runtime. Hardware registers with constant or computed
> > data
> > may not need to be part of the device state provided that device
> > implementations can produce the necessary data.
> > Device-internal state includes the portion of the device's state that
> > cannot be
> > reconstructed from the hardware interface alone. Defining device-internal
> > state
> > in the most general way instead of exposing device implementation details
> > allows for flexibility in the future. For example, device implementations
> > often
> > maintain a ring index, which is not available through the hardware
> > interface,
> > to keep track of which ring elements have already been consumed. The ring
> > index
> > must be included in the device state so that the destination can resume
> > processing from the correct point in the ring. Representing this as an index
> > into the ring in the hardware interface is more general than adding device
> > implementation-specific request tracking data structures into the device
> > state.
> > The *device state representation* defines the binary data layout of the
> > device
> > state. The device state representation is specific to each device and is
> > beyond
> > the scope of this document, but aspects pertaining to migration
> > compatibility
> > are discussed here.
> > Each change to the device state representation that affects migration
> > compatibility requires a migration parameter. When a new field is added to
> > the
> > device state representation then a new migration parameter must be added to
> > reflect this change. Often a single migration parameter expresses both a
> > change
> > to the hardware interface and the device state representation. It is also
> > possible to change the device state representation without changing the
> > hardware interface, for example when some state was forgotten while
> > designing
> > the previous device state representation.
> > The device state representation may support extra data that can be safely
> > ignored by old device implementations. In this case migration compatibility
> > is
> > unaffected and a migration parameter is not required to indicate such extra
> > data has been added.
> > Device Models
> > -------------
> > The combination of the hardware interface, device state representation, and
> > migration parameter definitions is called a *device model*. Device models
> > are
> > identified by a unique UTF-8 string starting with a domain name and
> > followed by
> > path components separated with backslashes ('/'). Examples include
> > vendor-a.com/my-nic, gitlab.com/user/my-device,
> > virtio-spec.org/pci/virtio-net,
> > and qemu.org/pci/10ec/8139.
> > The unique device model string is not changed as the device evolves.
> > Instead,
> > migration parameters are added to express variations in a device.
> > The device model is not tied to a specific device implementation. The same
> > device model could be implemented as a VFIO/dev driver or as a vfio-user
> > device
> > emulation program.
> > Multiple device implementations can support the same device model. Doing so
> > means that the device implementations can offer migration compatiblity
> > because
> > they support the same hardware interface, device state representation, and
> > migration parameters.
> > Multiple device models can exist for the same hardware interface, each with
> > a
> > different device state representation and migration parameters. This makes
> > it
> > possible to fork and independently develop device models.
> > Device models can evolve over time as the hardware interface and device
> > state
> > representation change. The corresponding migration parameters ensure that
> > migration compatibility can be established between device implementations.
> > Orchestrating Migrations
> > ------------------------
> > The following steps must be followed to migrate devices:
> > 1. Check that the source and destination support the same device model.
> > 2. Check that the destination supports the migration parameter list from the
> > source.
> > 3. Configure the destination so it is prepared to load the device state.
> > This
> > may involve instantiating a new device instance or resetting an existing
> > device instance to a configuration that is compatible with the source.
> > The migration parameter list may be used as part of this configuration,
> > but
> > note that not all of the configuration is captured in the migration
> > parameter list. For example, the physical network port for a network
> > card or
> > the host file system path for a disk image file is typically not
> > captured in
> > the migration parameters and must be provided through other means.
> > 4. Save the device state on the source and load it on the destination.
> > 5. If migration succeeds then the destination resumes operation and the
> > source
> > must not resume operation. If the migration fails then the source resumes
> > operation and the destination must not resume operation.
> > Note that these steps impose a conservative bound on device states that can
> > be
> > migrated successfully. Not all configuration parameters may be strictly
> > required to match on the source and destination devices. For example, if the
> > device's hardware interface has not yet been initialized then changes to the
> > advertised features may not yet affect the device driver. However,
> > accurately
> > representing runtime constraints is complex and risks introducing migration
> > bugs, so no attempt is made to support them.
> > VFIO/mdev Devices
> > -----------------
> > TODO this is a first draft, more thought needed around enumerating supported
> > parameters, representing default values, etc
> > The following mdev type sysfs attrs are available for managing device
> > instances:
> > /sys/.../<parent-device>/mdev_supported_types/<type-id>/
> > create - writing a UUID to this file instantiates a device
> > migration/ - migration related files
> > model - unique device model string, e.g. vendor-a.com/my-nic
> > Device models supported by an mdev driver can be enumerated by reading the
> > migration/model attr for each <type-id>.
> > The following mdev device sysfs attrs relate to a specific device instance:
> > /sys/.../<parent-device>/<uuid>/
> > mdev_type/ - symlink to mdev type sysfs attrs, e.g. to fetch
> > migration/model
> > migration/ - migration related files
> > applied - Write "1" to apply current migration parameter values or
> > "0" to reset migration parameter values to their
> > defaults.
> > Parameters can only be applied or reset while the mdev
> > is
> > not opened.
> This seems problematic, why aren't parameters applied on write so that
> userspace can understand the bad values?
I found a way to get rid of the "applied" sysfs attr. Will fix in the
> > params/ - migration parameters
> > <my-param> - read/write migration parameter "my-param"
> > ...
> Where do we learn the type and possibly valid values for a parameter?
The next revision will add that information.
> > When the device is created the migration/applied attr is "0". Migration
> > parameters are accessible in migration/params/ and read 0 bytes because they
> > are at their default values. At the point opening the mdev device will fail
> > because migration parameters must be applied first. Migration parameters
> > can be
> > set to the desired values or left at their defaults. "1" must be written to
> > migration/applied before opening the mdev device.
> This breaks existing users, there cannot be a new requirement to apply
> parameters or manipulate a new sysfs attribute before a device is
> usable. Besides, shouldn't default values always be acceptable? This
> presents a pretty high barrier for new features too, there will always
> be a step where userspace must know about and actively enable that
> feature. That puts vendors in a difficult situation, either they break
> migration by creating a new device model which enables features by
> default or they need to go to extraordinary lengths to get userspace to
> enable new features. Is there intended to be a policy where all
> parameters are enabled if we're not trying to match an existing device?
> How would a value be determined where the parameter is not binary?
Good points, the next revision will solve this so the device is created
with the latest supported migration parameter values by default instead
of the oldest/most compatible ones.
> > If writing to a migration/params/<param> attr or setting migration/applied
> > to
> > "1" fails, then the device implementation does not support the migration
> > parameters.
> s/parameter/value/ If the parameter is not supported, the attribute
> shouldn't be present, right? It might also be a resource issue that
> prevents a value from being applied, errno might provide insight to
> which it is.
Yes, will fix.
> > An open mdev device typically does not allow migration parameters to be
> > changed
> > at runtime. However, certain migration/params attrs may allow writes at
> > runtime. Usually these migration parameters only affect the device state
> > representation and not the hardware interface. This makes it possible to
> > upgrade or downgrade the device state representation at runtime so that
> > migration is possible to newer or older device implementations.
> Who does this and when? How do we determine which are runtime and what
> are acceptable values? This seems really hard to orchestrate.
Modifying a device at runtime is an explicit operation. The user needs
to know what they are doing. I'm not sure if trying to define metadata
is useful since it cannot be done without an understanding of the
migration parameter's effect.
> > An existing mdev device instance can be reused by closing the mdev device
> > and
> > writing "0" to migration/applied. This resets parameters to their defaults
> > so
> > that a new list of migration parameters can be applied.
> Nope, can't make new requirements for re-use of an mdev device either.
> I would expect an mdev device to retain it's configuration for the next
> use, userspace can reset parameters as necessary or remove and recreate
> the device. Thanks,
Will fix in the next revision.
Description: PGP signature