[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V3 00/10] fix migration of suspended runstate
From: |
Steven Sistare |
Subject: |
Re: [PATCH V3 00/10] fix migration of suspended runstate |
Date: |
Fri, 25 Aug 2023 13:56:24 -0400 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 |
On 8/25/2023 11:07 AM, Peter Xu wrote:
> On Fri, Aug 25, 2023 at 09:28:28AM -0400, Steven Sistare wrote:
>> On 8/24/2023 5:09 PM, Steven Sistare wrote:
>>> On 8/17/2023 2:23 PM, Peter Xu wrote:
>>>> On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote:
>>>>> Migration of a guest in the suspended runstate is broken. The incoming
>>>>> migration code automatically tries to wake the guest, which is wrong;
>>>>> the guest should end migration in the same runstate it started. Further,
>>>>> for a restored snapshot, the automatic wakeup fails. The runstate is
>>>>> RUNNING, but the guest is not. See the commit messages for the details.
>>>>
>>>> Hi Steve,
>>>>
>>>> I drafted two small patches to show what I meant, on top of this series.
>>>> Before applying these two, one needs to revert patch 1 in this series.
>>>>
>>>> After applied, it should also pass all three new suspend tests. We can
>>>> continue the discussion here based on the patches.
>>>
>>> Your 2 patches look good. I suggest we keep patch 1, and I squash patch 2
>>> into the other patches.
>
> Yes. Feel free to reorganize / modify /.. the changes in whatever way you
> prefer in the final patchset.
>
>>>
>>> There is one more fix needed: on the sending side, if the state is
>>> suspended,
>>> then ticks must be disabled so the tick globals are updated before they are
>>> written to vmstate. Otherwise, tick starts at 0 in the receiver when
>>> cpu_enable_ticks is called.
>>>
>>> -------------------------------------------
>>> diff --git a/migration/migration.c b/migration/migration.c
>> [...]
>>> -------------------------------------------
>>
>> This diff is just a rough draft. I need to resume ticks if the migration
>> fails or is cancelled, and I am trying to push the logic into vm_stop,
>> vm_stop_force_state, and vm_start, and/or vm_prepare_start.
>
> Yes this sounds better than hard code things into migration codes, thanks.
>
> Maybe at least all the migration related code paths should always use
> vm_stop_force_state() (e.g. save_snapshot)?
>
> At the meantime, AFAIU we should allow runstate_is_running() to return true
> even for suspended, matching current usages of vm_start() / vm_stop(). But
> again that can have risk of breaking existing users.
>
> I bet you may have a better grasp of what it should look like to solve the
> current "migrate suspended VM" problem at the minimum but hopefully still
> in a clean way, so I assume I'll just wait and see.
I found a better way.
Rather than disabling ticks, I added a pre_save handler to capture and save
the correct timer state even if the timer is running, using the logic from
cpr_disable_ticks. No changes needed in the migration code:
------------------------------------
diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c
index 117408c..d5af317 100644
--- a/softmmu/cpu-timers.c
+++ b/softmmu/cpu-timers.c
@@ -157,6 +157,36 @@ static bool icount_shift_state_needed(void *opaque)
return icount_enabled() == 2;
}
+static int cpu_pre_save_ticks(void *opaque)
+{
+ TimersState *t = &timers_state;
+ TimersState *snap = opaque;
+
+ seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+
+ if (t->cpu_ticks_enabled) {
+ snap->cpu_ticks_offset = t->cpu_ticks_offset + cpu_get_host_ticks();
+ snap->cpu_clock_offset = cpu_get_clock_locked();
+ } else {
+ snap->cpu_ticks_offset = t->cpu_ticks_offset;
+ snap->cpu_clock_offset = t->cpu_clock_offset;
+ }
+ seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+ return 0;
+}
+
+static int cpu_post_load_ticks(void *opaque, int version_id)
+{
+ TimersState *t = &timers_state;
+ TimersState *snap = opaque;
+
+ seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+ t->cpu_ticks_offset = snap->cpu_ticks_offset;
+ t->cpu_clock_offset = snap->cpu_clock_offset;
+ seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+ return 0;
+}
+
/*
* Subsection for warp timer migration is optional, because may not be created
*/
@@ -221,6 +251,8 @@ static const VMStateDescription vmstate_timers = {
.name = "timer",
.version_id = 2,
.minimum_version_id = 1,
+ .pre_save = cpu_pre_save_ticks,
+ .post_load = cpu_post_load_ticks,
.fields = (VMStateField[]) {
VMSTATE_INT64(cpu_ticks_offset, TimersState),
VMSTATE_UNUSED(8),
@@ -269,9 +301,11 @@ TimersState timers_state;
/* initialize timers state and the cpu throttle for convenience */
void cpu_timers_init(void)
{
+ static TimersState timers_snapshot;
+
seqlock_init(&timers_state.vm_clock_seqlock);
qemu_spin_init(&timers_state.vm_clock_lock);
- vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
+ vmstate_register(NULL, 0, &vmstate_timers, &timers_snapshot);
cpu_throttle_init();
}
------------------------------------
- Steve
- [PATCH V3 03/10] migration: add runstate function, (continued)
- [PATCH V3 03/10] migration: add runstate function, Steve Sistare, 2023/08/14
- [PATCH V3 01/10] vl: start on wakeup request, Steve Sistare, 2023/08/14
- [PATCH V3 07/10] tests/qtest: option to suspend during migration, Steve Sistare, 2023/08/14
- [PATCH V3 10/10] tests/qtest: background migration with suspend, Steve Sistare, 2023/08/14
- Re: [PATCH V3 00/10] fix migration of suspended runstate, Peter Xu, 2023/08/17