Pre-copy support allows the VFIO device data to be transferred while the
VM is running. This helps to accommodate VFIO devices that have a large
amount of data that needs to be transferred, and it can reduce migration
downtime.
Pre-copy support is optional in VFIO migration protocol v2.
Implement pre-copy of VFIO migration protocol v2 and use it for devices
that support it. Full description of it can be found here [1].
[1]
https://lore.kernel.org/kvm/20221206083438.37807-3-yishaih@nvidia.com/
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
docs/devel/vfio-migration.rst | 29 ++++++---
include/hw/vfio/vfio-common.h | 3 +
hw/vfio/common.c | 8 ++-
hw/vfio/migration.c | 112 ++++++++++++++++++++++++++++++++--
hw/vfio/trace-events | 5 +-
5 files changed, 140 insertions(+), 17 deletions(-)
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 1d50c2fe5f..51f5e1a537 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -7,12 +7,14 @@ the guest is running on source host and restoring this saved
state on the
destination host. This document details how saving and restoring of VFIO
devices is done in QEMU.
-Migration of VFIO devices currently consists of a single stop-and-copy phase.
-During the stop-and-copy phase the guest is stopped and the entire VFIO device
-data is transferred to the destination.
-
-The pre-copy phase of migration is currently not supported for VFIO devices.
-Support for VFIO pre-copy will be added later on.
+Migration of VFIO devices consists of two phases: the optional pre-copy phase,
+and the stop-and-copy phase. The pre-copy phase is iterative and allows to
+accommodate VFIO devices that have a large amount of data that needs to be
+transferred. The iterative pre-copy phase of migration allows for the guest to
+continue whilst the VFIO device state is transferred to the destination, this
+helps to reduce the total downtime of the VM. VFIO devices can choose to skip
+the pre-copy phase of migration by not reporting the VFIO_MIGRATION_PRE_COPY
+flag in VFIO_DEVICE_FEATURE_MIGRATION ioctl.
A detailed description of the UAPI for VFIO device migration can be found in
the comment for the ``vfio_device_mig_state`` structure in the header file
@@ -29,6 +31,12 @@ VFIO implements the device hooks for the iterative approach
as follows:
driver, which indicates the amount of data that the vendor driver has yet to
save for the VFIO device.
+* An ``is_active_iterate`` function that indicates ``save_live_iterate`` is
+ active only if the VFIO device is in pre-copy states.
+
+* A ``save_live_iterate`` function that reads the VFIO device's data from the
+ vendor driver during iterative phase.
+
* A ``save_state`` function to save the device config space if it is present.
* A ``save_live_complete_precopy`` function that sets the VFIO device in
@@ -91,8 +99,10 @@ Flow of state changes during Live migration
===========================================
Below is the flow of state change during live migration.
-The values in the brackets represent the VM state, the migration state, and
+The values in the parentheses represent the VM state, the migration state, and
the VFIO device state, respectively.
+The text in the square brackets represents the flow if the VFIO device supports
+pre-copy.
Live migration save path
------------------------
@@ -104,11 +114,12 @@ Live migration save path
|
migrate_init spawns migration_thread
Migration thread then calls each device's .save_setup()
- (RUNNING, _SETUP, _RUNNING)
+ (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
|
- (RUNNING, _ACTIVE, _RUNNING)
+ (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
If device is active, get pending_bytes by .save_live_pending()
If total pending_bytes >= threshold_size, call .save_live_iterate()
+ [Data of VFIO device for pre-copy phase is copied]
Iterate till total pending bytes converge and are less than threshold
|
On migration completion, vCPU stops and calls .save_live_complete_precopy
for
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 5f8e7a02fe..88c2194fb9 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -67,7 +67,10 @@ typedef struct VFIOMigration {
int data_fd;
void *data_buffer;
size_t data_buffer_size;
+ uint64_t mig_flags;
uint64_t stop_copy_size;
+ uint64_t precopy_init_size;
+ uint64_t precopy_dirty_size;
} VFIOMigration;
typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9a0dbee6b4..93b18c5e3d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -357,7 +357,9 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer
*container)
if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
- migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
+ migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P ||
+ migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ||
+ migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P)) {