qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V3 00/10] fix migration of suspended runstate


From: Steven Sistare
Subject: Re: [PATCH V3 00/10] fix migration of suspended runstate
Date: Fri, 25 Aug 2023 13:56:24 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0

On 8/25/2023 11:07 AM, Peter Xu wrote:
> On Fri, Aug 25, 2023 at 09:28:28AM -0400, Steven Sistare wrote:
>> On 8/24/2023 5:09 PM, Steven Sistare wrote:
>>> On 8/17/2023 2:23 PM, Peter Xu wrote:
>>>> On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote:
>>>>> Migration of a guest in the suspended runstate is broken.  The incoming
>>>>> migration code automatically tries to wake the guest, which is wrong;
>>>>> the guest should end migration in the same runstate it started.  Further,
>>>>> for a restored snapshot, the automatic wakeup fails.  The runstate is
>>>>> RUNNING, but the guest is not.  See the commit messages for the details.
>>>>
>>>> Hi Steve,
>>>>
>>>> I drafted two small patches to show what I meant, on top of this series.
>>>> Before applying these two, one needs to revert patch 1 in this series.
>>>>
>>>> After applied, it should also pass all three new suspend tests.  We can
>>>> continue the discussion here based on the patches.
>>>
>>> Your 2 patches look good.  I suggest we keep patch 1, and I squash patch 2
>>> into the other patches.
> 
> Yes.  Feel free to reorganize / modify /.. the changes in whatever way you
> prefer in the final patchset.
> 
>>>
>>> There is one more fix needed: on the sending side, if the state is 
>>> suspended,
>>> then ticks must be disabled so the tick globals are updated before they are
>>> written to vmstate.  Otherwise, tick starts at 0 in the receiver when
>>> cpu_enable_ticks is called.
>>>
>>> -------------------------------------------
>>> diff --git a/migration/migration.c b/migration/migration.c
>> [...]
>>> -------------------------------------------
>>
>> This diff is just a rough draft.  I need to resume ticks if the migration
>> fails or is cancelled, and I am trying to push the logic into vm_stop,
>> vm_stop_force_state, and vm_start, and/or vm_prepare_start.
> 
> Yes this sounds better than hard code things into migration codes, thanks.
> 
> Maybe at least all the migration related code paths should always use
> vm_stop_force_state() (e.g. save_snapshot)?
> 
> At the meantime, AFAIU we should allow runstate_is_running() to return true
> even for suspended, matching current usages of vm_start() / vm_stop().  But
> again that can have risk of breaking existing users.
> 
> I bet you may have a better grasp of what it should look like to solve the
> current "migrate suspended VM" problem at the minimum but hopefully still
> in a clean way, so I assume I'll just wait and see.

I found a better way.
Rather than disabling ticks, I added a pre_save handler to capture and save
the correct timer state even if the timer is running, using the logic from
cpr_disable_ticks. No changes needed in the migration code:

------------------------------------
diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c
index 117408c..d5af317 100644
--- a/softmmu/cpu-timers.c
+++ b/softmmu/cpu-timers.c
@@ -157,6 +157,36 @@ static bool icount_shift_state_needed(void *opaque)
     return icount_enabled() == 2;
 }

+static int cpu_pre_save_ticks(void *opaque)
+{
+    TimersState *t = &timers_state;
+    TimersState *snap = opaque;
+
+    seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+
+    if (t->cpu_ticks_enabled) {
+        snap->cpu_ticks_offset = t->cpu_ticks_offset + cpu_get_host_ticks();
+        snap->cpu_clock_offset = cpu_get_clock_locked();
+    } else {
+        snap->cpu_ticks_offset = t->cpu_ticks_offset;
+        snap->cpu_clock_offset = t->cpu_clock_offset;
+    }
+    seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+    return 0;
+}
+
+static int cpu_post_load_ticks(void *opaque, int version_id)
+{
+    TimersState *t = &timers_state;
+    TimersState *snap = opaque;
+
+    seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+    t->cpu_ticks_offset = snap->cpu_ticks_offset;
+    t->cpu_clock_offset = snap->cpu_clock_offset;
+    seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+    return 0;
+}
+
 /*
  * Subsection for warp timer migration is optional, because may not be created
  */
@@ -221,6 +251,8 @@ static const VMStateDescription vmstate_timers = {
     .name = "timer",
     .version_id = 2,
     .minimum_version_id = 1,
+    .pre_save = cpu_pre_save_ticks,
+    .post_load = cpu_post_load_ticks,
     .fields = (VMStateField[]) {
         VMSTATE_INT64(cpu_ticks_offset, TimersState),
         VMSTATE_UNUSED(8),
@@ -269,9 +301,11 @@ TimersState timers_state;
 /* initialize timers state and the cpu throttle for convenience */
 void cpu_timers_init(void)
 {
+    static TimersState timers_snapshot;
+
     seqlock_init(&timers_state.vm_clock_seqlock);
     qemu_spin_init(&timers_state.vm_clock_lock);
-    vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
+    vmstate_register(NULL, 0, &vmstate_timers, &timers_snapshot);

     cpu_throttle_init();
 }
------------------------------------

- Steve



reply via email to

[Prev in Thread] Current Thread [Next in Thread]