On Thu, 14 Nov 2019 01:47:04 +0530
Kirti Wankhede <address@hidden> wrote:
On 11/14/2019 1:18 AM, Alex Williamson wrote:
On Thu, 14 Nov 2019 00:59:52 +0530
Kirti Wankhede <address@hidden> wrote:
On 11/13/2019 11:57 PM, Alex Williamson wrote:
On Wed, 13 Nov 2019 11:24:17 +0100
Cornelia Huck <address@hidden> wrote:
On Tue, 12 Nov 2019 15:30:05 -0700
Alex Williamson <address@hidden> wrote:
On Tue, 12 Nov 2019 22:33:36 +0530
Kirti Wankhede <address@hidden> wrote:
- Defined MIGRATION region type and sub-type.
- Used 3 bits to define VFIO device states.
Bit 0 => _RUNNING
Bit 1 => _SAVING
Bit 2 => _RESUMING
Combination of these bits defines VFIO device's state during migration
_RUNNING => Normal VFIO device running state. When its reset, it
indicates _STOPPED state. when device is changed to
_STOPPED, driver should stop device before write()
returns.
_SAVING | _RUNNING => vCPUs are running, VFIO device is running but
start saving state of device i.e. pre-copy state
_SAVING => vCPUs are stopped, VFIO device should be stopped, and
s/should/must/
save device state,i.e. stop-n-copy state
_RESUMING => VFIO device resuming state.
_SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states
A table might be useful here and in the uapi header to indicate valid
states:
I like that.
| _RESUMING | _SAVING | _RUNNING | Description
+-----------+---------+----------+------------------------------------------
| 0 | 0 | 0 | Stopped, not saving or resuming (a)
+-----------+---------+----------+------------------------------------------
| 0 | 0 | 1 | Running, default state
+-----------+---------+----------+------------------------------------------
| 0 | 1 | 0 | Stopped, migration interface in save mode
+-----------+---------+----------+------------------------------------------
| 0 | 1 | 1 | Running, save mode interface, iterative
+-----------+---------+----------+------------------------------------------
| 1 | 0 | 0 | Stopped, migration resume interface active
+-----------+---------+----------+------------------------------------------
| 1 | 0 | 1 | Invalid (b)
+-----------+---------+----------+------------------------------------------
| 1 | 1 | 0 | Invalid (c)
+-----------+---------+----------+------------------------------------------
| 1 | 1 | 1 | Invalid (d)
I think we need to consider whether we define (a) as generally
available, for instance we might want to use it for diagnostics or a
fatal error condition outside of migration.
Are there hidden assumptions between state transitions here or are
there specific next possible state diagrams that we need to include as
well?
Some kind of state-change diagram might be useful in addition to the
textual description anyway. Let me try, just to make sure I understand
this correctly:
During User application initialization, there is one more state change:
0) 0/0/0 ---- stop to running -----> 0/0/1
0/0/0 cannot be the initial state of the device, that would imply that
a device supporting this migration interface breaks backwards
compatibility with all existing vfio userspace code and that code needs
to learn to set the device running as part of its initialization.
That's absolutely unacceptable. The initial device state must be 0/0/1.
There isn't any device state for all existing vfio userspace code right
now. So default its assumed to be always running.
Exactly, there is no representation of device state, therefore it's
assumed to be running, therefore when adding a representation of device
state it must default to running.
With migration support, device states are explicitly getting added. For
example, in case of QEMU, while device is getting initialized, i.e. from
vfio_realize(), device_state is set to 0/0/0, but not required to convey
it to vendor driver.
But we have a 0/0/0 state, why would we intentionally keep an internal
state that's inconsistent with the device?
Then with vfio_vmstate_change() notifier, device
state is changed to 0/0/1 when VM/vCPU are transitioned to running, at
this moment device state is conveyed to vendor driver. So vendor driver
doesn't see 0/0/0 state.
But the running state is the state of the device, not the VM or the
vCPU. Sure we might want to stop the device if the VM/vCPU state is
stopped, but we must accept that the device is running when it's opened
and we shouldn't intentionally maintain inconsistent state.
While resuming, for userspace, for example QEMU, device state change is
from 0/0/0 to 1/0/0, vendor driver see 1/0/0 after device basic
initialization is done.
I don't see why this matters, all device_state transitions are written
directly to the vendor driver. The device is initially in 0/0/1 and
can be set to 1/0/0 for resuming with an optional transition through
0/0/0 and the vendor driver can see each state change.
1) 0/0/1 ---(trigger driver to start gathering state info)---> 0/1/1
not just gathering state info, but also copy device state to be
transferred during pre-copy phase.
Below 2 state are not just to tell driver to stop, those 2 differ.
2) is device state changed from running to stop, this is when VM
shutdowns cleanly, no need to save device state
Userspace is under no obligation to perform this state change though,
backwards compatibility dictates this.
2) 0/0/1 ---(tell driver to stop)---> 0/0/0
3) 0/1/1 ---(tell driver to stop)---> 0/1/0
above is transition from pre-copy phase to stop-and-copy phase, where
device data should be made available to user to transfer to destination
or to save it to file in case of save VM or suspend.
4) 0/0/1 ---(tell driver to resume with provided info)---> 1/0/0
I think this is to switch into resuming mode, the data will follow >
5) 1/0/0 ---(driver is ready)---> 0/0/1
6) 0/1/1 ---(tell driver to stop saving)---> 0/0/1
above can occur on migration cancelled or failed.
I think also:
0/0/1 --> 0/1/0 If user chooses to go directly to stop and copy
that's right, this happens in case of save VM or suspend VM.
0/0/0 and 0/0/1 should be reachable from any state, though I could see
that a vendor driver could fail transition from 1/0/0 -> 0/0/1 if the
received state is incomplete. Somehow though a user always needs to
return the device to the initial state, so how does device_state
interact with the reset ioctl? Would this automatically manipulate
device_state back to 0/0/1?
why would reset occur on 1/0/0 -> 0/0/1 failure?
The question is whether the reset ioctl automatically puts the device
back into the initial state, 0/0/1. A reset from 1/0/0 -> 0/0/1
presumably discards much of the device state we just restored, so
clearly that would be undesirable.
1/0/0 -> 0/0/1 fails, then user should convey that to source that
migration has failed, then resume at source.
In the scheme of the migration yet, but as far as the vfio interface is
concerned the user should have a path to make use of a device after
this point without closing it and starting over. Thus, if a 1/0/0 ->
0/0/1 transition fails, would we define the device reset ioctl as a
mechanism to flush the bogus state and place the device into the 0/0/1
initial state?
Ok, userspace applications can be designed to do that. As of now with
QEMU, I don't see a way to reset device on 1/0/0-> 0/0/1 failure.
It's simply an ioctl, we must already have access to the device file
descriptor to perform the device_state transition. QEMU is not
necessarily the consumer of this behavior though, if transition 1/0/0
-> 0/0/1 fails in QEMU, it very well may just exit. The vfio API
should support a defined mechanism to recover the device from this
state though, which I propose is the existing reset ioctl, which
logically implies that any device reset returns the device_state to
0/0/1.