qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v3] VFIO Migration


From: Dr. David Alan Gilbert
Subject: Re: [RFC v3] VFIO Migration
Date: Wed, 11 Nov 2020 15:41:59 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Nov 11, 2020 at 12:56:26PM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > Orchestrating Migrations
> > > ------------------------
> > > In order to migrate a device a *migration parameter list* must first be 
> > > built
> > > on the source. Each migration parameter is added to the list if it is in
> > > effect. For example, the migration parameter list for a device with
> > > new-feature=off,num-queues=4 would be num-queues=4 if the new-feature 
> > > migration
> > > parameter was introduced with the off value disabling its effect.
> > 
> > What component builds that list (i.e. what component needs to know the
> > history that new-feature=off was the default - ah I think you answer
> > that below).
> 
> Yep. Thanks for noting this. I'll need to reorder things so it is clear.
> 
> > > The following conditions must be met to establish migration compatibility:
> > > 
> > > 1. The source and destination device model strings match.
> > > 
> > > 2. Each migration parameter name from the migration parameter list is 
> > > supported
> > >    by the destination. For example, the destination supports the 
> > > num-queues
> > >    migration parameter.
> > > 
> > > 3. Each migration parameter value from the migration parameter list is
> > >    supported by the destination. For example, the destination supports
> > >    num-queues=4.
> > 
> > Hmm, are combinations of parameter checks needed - i.e. is it possible
> > that a destination supports    num-queues=4 and  new-feature=on/off -
> > but only supports new-feature=on when num-queues>2 ?
> 
> Yes, it's possible but cannot be expressed in the migration info JSON.
> 
> We need to choose a level of expressiveness that will be useful enough
> without being complex. In the extreme the migration info would contain
> Turing complete validation expressions (e.g. JavaScript) so that any
> relationship can be expressed, but I doubt that complexity is needed.
> The other extreme is just booleans and (opaque) strings for maximum
> simplicity.
> 
> If the syntax is not expressive enough then it's impossible to check
> migration compatibility without actually creating a new device instance
> on the destination. Daniel Berrange raised the requirement of checking
> migration compatibility without creating the device since this helps
> with selecting a migration destination.

Right, but my worry isn't the JSON description, it's the set of 3
conditions above; they need to state that only some combinations need to
be valid.

> 
> > > The migration compatibility check can be performed without initiating a
> > > migration. Therefore, this process can be used to select the migration
> > > destination.
> > > 
> > > The following steps perform the migration:
> > > 
> > > 1. Configure the destination so it is prepared to load the device state,
> > >    including applying the migration parameter list. This may involve
> > >    instantiating a new device instance or resetting an existing device 
> > > instance
> > >    to a configuration that is compatible with the source.
> > > 
> > >    The details of how to do this for VFIO/mdev drivers and vfio-user 
> > > device
> > >    backend programs is described below.
> > > 
> > > 2. Save the device state on the source and load it on the destination.
> > 
> > Which is true for almost everything, unles sit turned out to have
> > significant amounts of RAM on board;  do we have a way to deal with that
> > for vfio/vhost-user - where it needs to be iterative? (Lets just ignore
> > this for now)
> 
> Step 2 includes iterative migration. I should have mentioned that in the
> document.

OK.

> > > "allowed_values"
> > >   The list all values that the device implementation accepts for this 
> > > migration
> > >   parameter. Integer ranges can be described using "<min>-<max>" strings.
> > > 
> > >   Examples: ['a', 'b', 'c'], [1, 5, 7], ['0-255', 512, '1024-2048'], 
> > > [true]
> > > 
> > >   This member is optional. When absent, any value suitable for the type 
> > > may be
> > >   given but the device implementation may refuse certain values.
> > 
> > JSON isn't a great choice for specifying ranges of integers
> 
> Agreed :)
> 
> > > The device is instantiated by launching the destination process with the
> > > migration parameter list from the source:
> > > 
> > > .. code:: bash
> > > 
> > >   $ my-device --m-<param1>=<value1> --m-<param2> <value2> [...]
> > > 
> > > This example shows how to instantiate the device with migration parameters
> > > ``param1`` and ``param2``. Both ``--m-<param>=<value>`` and ``--m-<param>
> > > <value>`` option formats are accepted.
> > > 
> > > The ``--m-`` prefix is used to allow the device emulation program to 
> > > implement
> > > device implementation-specific command-line options without conflicting 
> > > with
> > > the migration parameter namespace.
> > 
> > That feels like an odd syntax to me.
> 
> Unfortunately we cannot use --<param>. I also considered using a JSON
> input file but that makes it harder to invoke the device emulation
> program manually for testing/development. I bet I'd have to look up the
> JSON syntax every time whereas it's easy to remember how to format a
> command-line parameter.
> 
> The other one I considered was using '--' or another marker to separate
> device implementation-specific command-line arguments from migration
> parameters. However, doing so places requirements on the device
> emulation program's command-line parsing library and I think people will
> be unhappy if their favorite Go, Rust, Python, etc library cannot handle
> the command-line options due to our weird syntax.
> 
> Any ideas for a better syntax?

I'd be happy with a --param name=value   repeatedly, but also know that
some option parsers don't like that.

> > > When preparing for migration on the source, each migration parameter from 
> > > the
> > > migration info JSON is added to the migration parameter list if its value
> > > differs from "off_value". If a migration parameter in the list is not 
> > > available
> > > on the destination, then migration is not possible. If a migration 
> > > parameter
> > > value is not in the destination "allowed_values" migration_info.json then
> > > migration is not possible.
> > > 
> > > On the destination, a command-line is generated from the migration 
> > > parameter
> > > list. For each destination migration parameter missing from the migration
> > > parameter list a command-line option is added with the destination 
> > > "off_value".
> > > The device emulation program prints an error message to standard error and
> > > terminates with exit status 1 if the device could not be instantiated.
> > 
> > I still don't think this revision answers the question of how a VM
> > management program picks a sane set of parameter values for a new VM
> > it's creating, especially if it wants it to be migratable.  That's
> > something your version stuff in V1 seemed nice for.
> 
> Good point. If we're creating a VM and expect to migrate between two
> device implementations, how do we choose the migration parameters?
> 
> I can see a solution for that: grab the set of "init_values" from both
> device implementations and use the one that both accept. This is O(N^2)
> so it's not great when there are many device implementations involved.
> It's O(N) with version numbers because you can keep an intersection set
> of supported version numbers.

Which is actually more complex if there's only some combinations that
work.

> This point definitely needs to be included in the document. Is my answer
> acceptable or do you think versions are really needed?
> 
> It's also hard to answer "which of these two migration parameter lists
> is better/more modern?" without versions when non-bool migration
> parameters are involved.

Dave

> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]