qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V1 2/3] migration: fix suspended runstate


From: Peter Xu
Subject: Re: [PATCH V1 2/3] migration: fix suspended runstate
Date: Thu, 17 Aug 2023 14:19:05 -0400

On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote:
> On 8/14/2023 3:37 PM, Peter Xu wrote:
> > On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
> >>> Can we just call vm_state_notify() earlier?
> >>
> >> We cannot.  The guest is not running yet, and will not be until later.
> >> We cannot call notifiers that perform actions that complete, or react to, 
> >> the guest entering a running state.
> > 
> > I tried to look at a few examples of the notifees and most of them I read
> > do not react to "vcpu running" but "vm running" (in which case I think
> > "suspended" mode falls into "vm running" case); most of them won't care on
> > the RunState parameter passed in, but only the bool "running".
> > 
> > In reality, when running=true, it must be RUNNING so far.
> > 
> > In that case does it mean we should notify right after the switchover,
> > since after migration the vm is indeed running only if the vcpus are not
> > during suspend?
> 
> I cannot parse your question, but maybe this answers it.
> If the outgoing VM is running and not suspended, then the incoming side
> tests for autostart==true and calls vm_start, which calls the notifiers,
> right after the switchover.

I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like
RUNNING.  Then, we should invoke vm_prepare_start(), just need some touch
ups.

> 
> > One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
> > try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
> > device; this kind of prove to me that SUSPEND is actually one of
> > running=true states.
> > 
> > If we postpone all notifiers here even after we switched over to dest qemu
> > to the next upcoming suspend wakeup, I think it means these devices will
> > not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
> > VFIO_DEVICE_STATE_STOP.
> 
> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
> AFAIK it is OK to remain in that state until wakeup is called later.

So let me provide another clue of why I think we should call
vm_prepare_start()..

Firstly, I think RESUME event should always be there right after we
switched over, no matter suspeneded or not.  I just noticed that your test
case would work because you put "wakeup" to be before RESUME.  I'm not sure
whether that's correct.  I'd bet people could rely on that RESUME to
identify the switchover.

More importantly, I'm wondering whether RTC should still be running during
the suspended mode?  Sorry again if my knowledge over there is just
limited, so correct me otherwise - but my understanding is during suspend
mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should
still be running along with the system clock.  It means we _should_ at
least call cpu_enable_ticks() to enable rtc:

/*
 * enable cpu_get_ticks()
 * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
 */
void cpu_enable_ticks(void)

I think that'll enable cpu_get_tsc() and make it start to work right.

> 
> > Ideally I think we should here call vm_state_notify() with running=true and
> > state=SUSPEND, but since I do see some hooks are not well prepared for
> > SUSPEND over running=true, I'd think we should on the safe side call
> > vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
> > over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
> > later we just need to call no notifiers).
> 
> Notifiers are just one piece, all the code in vm_prepare_start must be called.
> Is it correct to call all of that long before we actually resume the CPUs in
> wakeup?  I don't know, but what is the point?

The point is not only for cleaness (again, I really, really don't like that
new global.. sorry), but also now I think we should make the vm running.

> The wakeup code still needs
> modification to conditionally resume the vcpus.  The scheme would be roughly:
> 
>     loadvm_postcopy_handle_run_bh()
>         runstat = global_state_get_runstate();
>         if (runstate == RUN_STATE_RUNNING) {
>             vm_start()
>         } else if (runstate == RUN_STATE_SUSPENDED)
>             vm_prepare_start();   // the start of vm_start()
>         }
> 
>     qemu_system_wakeup_request()
>         if (some condition)
>             resume_all_vcpus();   // the remainder of vm_start()
>         else
>             runstate_set(RUN_STATE_RUNNING)

No it doesn't.  wakeup_reason is set there, main loop does the resuming.
See:

    if (qemu_wakeup_requested()) {
        pause_all_vcpus();
        qemu_system_wakeup();
        notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
        wakeup_reason = QEMU_WAKEUP_REASON_NONE;
        resume_all_vcpus();
        qapi_event_send_wakeup();
    }

> 
> How is that better than my patches
>     [PATCH V3 01/10] vl: start on wakeup request
>     [PATCH V3 02/10] migration: preserve suspended runstate
> 
>     loadvm_postcopy_handle_run_bh()
>         runstate = global_state_get_runstate();
>         if (runstate == RUN_STATE_RUNNING)
>             vm_start()
>         else
>             runstate_set(runstate);    // eg RUN_STATE_SUSPENDED
> 
>     qemu_system_wakeup_request()
>         if (!vm_started)
>             vm_start();
>         else
>             runstate_set(RUN_STATE_RUNNING);
> 
> Recall this thread started with your comment "It then can avoid touching the 
> system wakeup code which seems cleaner".  We still need to touch the wakeup
> code.

Let me provide some code and reply to your new patchset inlines.

Thanks.

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]