qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH QEMU v25 08/17] vfio: Add save state functions to SaveVMHandl


From: Kirti Wankhede
Subject: Re: [PATCH QEMU v25 08/17] vfio: Add save state functions to SaveVMHandlers
Date: Wed, 24 Jun 2020 02:04:24 +0530
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.8.1



On 6/23/2020 4:20 AM, Alex Williamson wrote:
On Sun, 21 Jun 2020 01:51:17 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
   buffer.
- read data_size - amount of data in bytes written by vendor driver in
   migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
    doesn't need to be from vendor driver. Any other special config state
    from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
    buffer.
d. read data_size - amount of data in bytes written by vendor driver in
    migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
    {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
---
  hw/vfio/migration.c           | 283 ++++++++++++++++++++++++++++++++++++++++++
  hw/vfio/trace-events          |   6 +
  include/hw/vfio/vfio-common.h |   1 +
  3 files changed, 290 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 133bb5b1b3b2..ef1150c1ff02 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -140,6 +140,168 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
      return 0;
  }
+static void *get_data_section_size(VFIORegion *region, uint64_t data_offset,
+                                   uint64_t data_size, uint64_t *size)
+{
+    void *ptr = NULL;
+    int i;
+
+    if (!region->mmaps) {
+        *size = data_size;
+        return ptr;
+    }
+
+    /* check if data_offset in within sparse mmap areas */
+    for (i = 0; i < region->nr_mmaps; i++) {
+        VFIOMmap *map = region->mmaps + i;
+
+        if ((data_offset >= map->offset) &&
+            (data_offset < map->offset + map->size)) {
+            ptr = map->mmap + data_offset - map->offset;
+
+            if (data_offset + data_size <= map->offset + map->size) {
+                *size = data_size;
+            } else {
+                *size = map->offset + map->size - data_offset;
+            }

Ultimately we take whichever result is smaller, so we could just use:

*size = MIN(data_size, map->offset + map->size - data_offset);

+            break;
+        }
+    }
+
+    if (!ptr) {
+        uint64_t limit = 0;
+
+        /*
+         * data_offset is not within sparse mmap areas, find size of non-mapped
+         * area. Check through all list since region->mmaps list is not sorted.
+         */
+        for (i = 0; i < region->nr_mmaps; i++) {
+            VFIOMmap *map = region->mmaps + i;
+
+            if ((data_offset < map->offset) &&
+                (!limit || limit > map->offset)) {
+                limit = map->offset;
+            }

We could have done this in an else branch of the previous loop to avoid
walking the entries twice.


Ok. updating with above 2 changes.

+        }
+
+        *size = limit ? limit - data_offset : data_size;
+    }
+    return ptr;
+}
+
+static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
+{
+    VFIOMigration *migration = vbasedev->migration;
+    VFIORegion *region = &migration->region;
+    uint64_t data_offset = 0, data_size = 0, size;
+    int ret;
+
+    ret = pread(vbasedev->fd, &data_offset, sizeof(data_offset),
+                region->fd_offset + offsetof(struct vfio_device_migration_info,
+                                             data_offset));
+    if (ret != sizeof(data_offset)) {
+        error_report("%s: Failed to get migration buffer data offset %d",
+                     vbasedev->name, ret);
+        return -EINVAL;
+    }
+
+    ret = pread(vbasedev->fd, &data_size, sizeof(data_size),
+                region->fd_offset + offsetof(struct vfio_device_migration_info,
+                                             data_size));
+    if (ret != sizeof(data_size)) {
+        error_report("%s: Failed to get migration buffer data size %d",
+                     vbasedev->name, ret);
+        return -EINVAL;
+    }
+
+    trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
+                           migration->pending_bytes);
+
+    qemu_put_be64(f, data_size);
+    size = data_size;
+
+    while (size) {
+        void *buf = NULL;
+        bool buffer_mmaped;
+        uint64_t sec_size;
+
+        buf = get_data_section_size(region, data_offset, size, &sec_size);
+
+        buffer_mmaped = (buf != NULL);
+
+        if (!buffer_mmaped) {
+            buf = g_try_malloc(sec_size);
+            if (!buf) {
+                error_report("%s: Error allocating buffer ", __func__);
+                return -ENOMEM;
+            }
+
+            ret = pread(vbasedev->fd, buf, sec_size,
+                        region->fd_offset + data_offset);

Is the trade-off to allocate this buffer worth it?  I'd be tempted to
iterate with a basic data type here to avoid what could potentially be
a large memory allocation above.  It feels a little more robust, if not
perhaps as fast, but I this will mostly be a fallback or only cover
small ranges in normal operation.  Of course the data stream needs to
be compatible either way we retrieve it.


What should be basic data type here, u8, u16, u32, u64? We don't know at what granularity vendor driver is writing, then I thnk we have to go with smallest u8, right?


+            if (ret != sec_size) {
+                error_report("%s: Failed to get migration data %d",
+                             vbasedev->name, ret);
+                g_free(buf);
+                return -EINVAL;
+            }
+        }
+
+        qemu_put_buffer(f, buf, sec_size);
+
+        if (!buffer_mmaped) {
+            g_free(buf);
+        }
+        size -= sec_size;
+        data_offset += sec_size;
+    }
+
+    ret = qemu_file_get_error(f);
+    if (ret) {
+        return ret;
+    }
+
+    return data_size;

This function returns int, data_size is uint64_t.  Thanks,


Yes, returns for this function:
< 0 => error
==0 => no more data to save
data_size => amount of data saved in this function.

Thanks,
Kirti




reply via email to

[Prev in Thread] Current Thread [Next in Thread]