qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrate


From: David Hildenbrand
Subject: Re: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM)
Date: Mon, 9 Jan 2023 15:34:48 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0

On 05.01.23 18:15, Peter Xu wrote:
On Thu, Jan 05, 2023 at 09:35:54AM +0100, David Hildenbrand wrote:
On 04.01.23 18:23, Peter Xu wrote:
On Thu, Dec 22, 2022 at 12:02:10PM +0100, David Hildenbrand wrote:
Migrating device state before we start iterating is currently impossible.
Introduce and use qemu_savevm_state_start_precopy(), and use
a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether
state will be saved in qemu_savevm_state_start_precopy() or in
qemu_savevm_state_complete_precopy_*().

Can something like this be done in qemu_savevm_state_setup()?

Hi Peter,

Hi, David,


Do you mean

(a) Moving qemu_savevm_state_start_precopy() effectively into
     qemu_savevm_state_setup()

(b) Using se->ops->save_setup()

I meant (b).


I first tried going via (b), but decided to go the current way of using a
proper vmstate with properties (instead of e.g., filling the stream
manually), which also made vmdesc handling possible (and significantly
cleaner).

Regarding (a), I decided to not move logic of
qemu_savevm_state_start_precopy() into qemu_savevm_state_setup(), because it
looked cleaner to save device state with the BQL held and for background
snapshots, the VM has been stopped. To decouple device state saving from the
setup path, just like we do it right now for all vmstates.

Is BQL required or optional?  IIUC it's at least still not taken in the
migration thread path, only in savevm path.


Having that said, for virtio-mem, it would still work because that state is
immutable once migration starts, but it felt cleaner to separate the setup()
phase from actual device state saving.

I get the point.  My major concerns are:

   (1) The new migration priority is changing the semantic of original,
       making it over-complicated

   (2) The new precopy-start routine added one more step to the migration
       framework, while it's somehow overlapping (if not to say, mostly the
       same as..) save_setup().

For (1): the old priority was only deciding the order of save entries in
the global list, nothing more than that.  Even if we want to have a
precopy-start phase, I'd suggest we use something else and keep the
migration priority simple.  Otherwise we really need serious documentation
for MigrationPriority and if so I'd rather don't bother and not reuse the
priority field.

For (2), if you see there're a bunch of save_setup() that already does
things like transferring static data besides the device states.  Besides
the notorious ram_save_setup() there's also dirty_bitmap_save_setup() which
also sends a bitmap during save_setup() and some others.  It looks clean to
me to do it in the same way as we used to.

Reusing vmstate_save() and vmsd structures are useful too which I totally
agree.  So.. can we just call vmstate_save_state() in the save_setup() of
the other new vmsd of virtio-mem?


I went halfway that way, by moving stuff into qemu_savevm_state_setup()
and avoiding using a new migration priority. Seems to work:

I think we could go one step further and perform it from a save_setup() 
callback,
however, I'm not convinced that this gets particularly cleaner (vmdesc handling
eventually).

However, if there are hard feelings, I can look into that. Thanks.


From e501f80dbbca1260445a6dac03053f426fbb572d Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 20 Dec 2022 18:14:33 +0100
Subject: [PATCH] migration: Allow immutable device state to be migrated early
 (i.e., before RAM)

For virtio-mem, we want to have the plugged/unplugged state of memory
blocks available before migrating any actual RAM content. This
information is immutable on the migration source while migration is active,

For example, we want to use this information for proper preallocation
support with migration: currently, we don't preallocate memory on the
migration target, and especially with hugetlb, we can easily run out of
hugetlb pages during RAM migration and will crash (SIGBUS) instead of
catching this gracefully via preallocation.

Migrating device state before we start iterating is currently impossible.
Let's allow for migrating such state during the setup state, indicating
applicable vmstate descriptors using a "immutable" flag.

We have to take care of properly including the early device state in the
vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is
a bit sub-optimal, but we use that explicitly or implicitly all over the
place already, so this barely matters in practice.

Note that only very selected devices (i.e., ones seriously messing with
RAM setup) are supposed to make use of that.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/migration/vmstate.h |  5 +++
 migration/migration.c       |  4 ++
 migration/migration.h       |  4 ++
 migration/savevm.c          | 85 +++++++++++++++++++++++++++----------
 4 files changed, 75 insertions(+), 23 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ad24aa1934..610e4c1e38 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -179,6 +179,11 @@ struct VMStateField {
 struct VMStateDescription {
     const char *name;
     int unmigratable;
+    /*
+     * The state is immutable while migration is active and the state can
+     * be migrated early, during the setup phase.
+     */
+    int immutable;
     int version_id;
     int minimum_version_id;
     MigrationPriority priority;
diff --git a/migration/migration.c b/migration/migration.c
index 52b5d39244..1d33a7efa0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s)
     s->vm_was_running = false;
     s->iteration_initial_bytes = 0;
     s->threshold_size = 0;
+
+    json_writer_free(s->vmdesc);
+    s->vmdesc = NULL;
 }
int migrate_add_blocker_internal(Error *reason, Error **errp)
@@ -4445,6 +4448,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->rp_state.rp_sem);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
     error_free(ms->error);
+    json_writer_free(ms->vmdesc);
 }
static void migration_instance_init(Object *obj)
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..66511ce532 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -17,6 +17,7 @@
 #include "exec/cpu-common.h"
 #include "hw/qdev-core.h"
 #include "qapi/qapi-types-migration.h"
+#include "qapi/qmp/json-writer.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine_int.h"
 #include "io/channel.h"
@@ -366,6 +367,9 @@ struct MigrationState {
      * This save hostname when out-going migration starts
      */
     char *hostname;
+
+    /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */
+    JSONWriter *vmdesc;
 };
void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/savevm.c b/migration/savevm.c
index a0cdb714f7..e77f643f52 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -42,7 +42,6 @@
 #include "postcopy-ram.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
-#include "qapi/qmp/json-writer.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/qapi-builtin-visit.h"
 #include "qapi/qmp/qerror.h"
@@ -1161,14 +1160,63 @@ bool qemu_savevm_state_guest_unplug_pending(void)
     return false;
 }
+static int qemu_savevm_state_precopy_one_non_iterable(SaveStateEntry *se,
+                                                      QEMUFile *f,
+                                                      JSONWriter *vmdesc)
+{
+    int ret;
+
+    if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
+        trace_savevm_section_skip(se->idstr, se->section_id);
+        return 0;
+    }
+
+    trace_savevm_section_start(se->idstr, se->section_id);
+
+    json_writer_start_object(vmdesc, NULL);
+    json_writer_str(vmdesc, "name", se->idstr);
+    json_writer_int64(vmdesc, "instance_id", se->instance_id);
+
+    save_section_header(f, se, QEMU_VM_SECTION_FULL);
+    ret = vmstate_save(f, se, vmdesc);
+    if (ret) {
+        qemu_file_set_error(f, ret);
+        return ret;
+    }
+    trace_savevm_section_end(se->idstr, se->section_id, 0);
+    save_section_footer(f, se);
+
+    json_writer_end_object(vmdesc);
+    return 0;
+}
+
 void qemu_savevm_state_setup(QEMUFile *f)
 {
-    SaveStateEntry *se;
+    MigrationState *ms = migrate_get_current();
     Error *local_err = NULL;
+    SaveStateEntry *se;
+    JSONWriter *vmdesc;
     int ret;
+ assert(!ms->vmdesc);
+    ms->vmdesc = vmdesc = json_writer_new(false);
+    json_writer_start_object(vmdesc, NULL);
+    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
+    json_writer_start_array(vmdesc, "devices");
+
     trace_savevm_state_setup();
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->vmsd) {
+            if (!se->vmsd->immutable) {
+                continue;
+            }
+            ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc);
+            if (ret) {
+                break;
+            }
+            continue;
+        }
+
         if (!se->ops || !se->ops->save_setup) {
             continue;
         }
@@ -1364,41 +1412,28 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
                                                     bool in_postcopy,
                                                     bool inactivate_disks)
 {
-    g_autoptr(JSONWriter) vmdesc = NULL;
+    MigrationState *ms = migrate_get_current();
+    JSONWriter *vmdesc = ms->vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
- vmdesc = json_writer_new(false);
-    json_writer_start_object(vmdesc, NULL);
-    json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
-    json_writer_start_array(vmdesc, "devices");
-    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+    /* qemu_savevm_state_start_precopy() is expected to be called first. */
+    assert(vmdesc);
+ QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
             continue;
         }
-        if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) {
-            trace_savevm_section_skip(se->idstr, se->section_id);
+        if (se->vmsd && se->vmsd->immutable) {
+            /* Already saved during qemu_savevm_state_setup(). */
             continue;
         }
- trace_savevm_section_start(se->idstr, se->section_id);
-
-        json_writer_start_object(vmdesc, NULL);
-        json_writer_str(vmdesc, "name", se->idstr);
-        json_writer_int64(vmdesc, "instance_id", se->instance_id);
-
-        save_section_header(f, se, QEMU_VM_SECTION_FULL);
-        ret = vmstate_save(f, se, vmdesc);
+        ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc);
         if (ret) {
-            qemu_file_set_error(f, ret);
             return ret;
         }
-        trace_savevm_section_end(se->idstr, se->section_id, 0);
-        save_section_footer(f, se);
-
-        json_writer_end_object(vmdesc);
     }
if (inactivate_disks) {
@@ -1427,6 +1462,10 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len);
     }
+ /* Free it now to detect any inconsistencies. */
+    json_writer_free(vmdesc);
+    ms->vmdesc = NULL;
+
     return 0;
 }
--
2.39.0



--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]