qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] spapr: Migrate CAS reboot flag


From: David Gibson
Subject: Re: [PATCH] spapr: Migrate CAS reboot flag
Date: Tue, 21 Jan 2020 14:43:32 +1100

On Mon, Jan 20, 2020 at 09:04:38AM +0100, Greg Kurz wrote:
> On Fri, 17 Jan 2020 16:44:27 +0100
> Greg Kurz <address@hidden> wrote:
> 
> > On Fri, 17 Jan 2020 19:16:08 +1000
> > David Gibson <address@hidden> wrote:
> > 
> > > On Thu, Jan 16, 2020 at 07:29:02PM +0100, Greg Kurz wrote:
> > > > On Thu, 16 Jan 2020 13:14:35 +0100
> > > > Greg Kurz <address@hidden> wrote:
> > > > 
> > > > > On Thu, 16 Jan 2020 11:37:24 +0100
> > > > > Laurent Vivier <address@hidden> wrote:
> > > > > 
> > > > > > On 16/01/2020 09:48, Greg Kurz wrote:
> > > > > > > On Wed, 15 Jan 2020 19:10:37 +0100
> > > > > > > Laurent Vivier <address@hidden> wrote:
> > > > > > > 
> > > > > > >> Hi,
> > > > > > >>
> > > > > > >> On 15/01/2020 18:48, Greg Kurz wrote:
> > > > > > >>> Migration can potentially race with CAS reboot. If the 
> > > > > > >>> migration thread
> > > > > > >>> completes migration after CAS has set spapr->cas_reboot but 
> > > > > > >>> before the
> > > > > > >>> mainloop could pick up the reset request and reset the machine, 
> > > > > > >>> the
> > > > > > >>> guest is migrated unrebooted and the destination doesn't reboot 
> > > > > > >>> it
> > > > > > >>> either because it isn't aware a CAS reboot was needed (eg, 
> > > > > > >>> because a
> > > > > > >>> device was added before CAS). This likely result in a broken or 
> > > > > > >>> hung
> > > > > > >>> guest.
> > > > > > >>>
> > > > > > >>> Even if it is small, the window between CAS and CAS reboot is 
> > > > > > >>> enough to
> > > > > > >>> re-qualify spapr->cas_reboot as state that we should migrate. 
> > > > > > >>> Add a new
> > > > > > >>> subsection for that and always send it when a CAS reboot is 
> > > > > > >>> pending.
> > > > > > >>> This may cause migration to older QEMUs to fail but it is still 
> > > > > > >>> better
> > > > > > >>> than end up with a broken guest.
> > > > > > >>>
> > > > > > >>> The destination cannot honour the CAS reboot request from a 
> > > > > > >>> post load
> > > > > > >>> handler because this must be done after the guest is fully 
> > > > > > >>> restored.
> > > > > > >>> It is thus done from a VM change state handler.
> > > > > > >>>
> > > > > > >>> Reported-by: Lukáš Doktor <address@hidden>
> > > > > > >>> Signed-off-by: Greg Kurz <address@hidden>
> > > > > > >>> ---
> > > > > > >>>
> > > > > > >>
> > > > > > >> I'm wondering if the problem can be related with the fact that
> > > > > > >> main_loop_should_exit() could release qemu_global_mutex in
> > > > > > >> pause_all_vcpus() in the reset case?
> > > > > > >>
> > > > > > >> 1602 static bool main_loop_should_exit(void)
> > > > > > >> 1603 {
> > > > > > >> ...
> > > > > > >> 1633     request = qemu_reset_requested();
> > > > > > >> 1634     if (request) {
> > > > > > >> 1635         pause_all_vcpus();
> > > > > > >> 1636         qemu_system_reset(request);
> > > > > > >> 1637         resume_all_vcpus();
> > > > > > >> 1638         if (!runstate_check(RUN_STATE_RUNNING) &&
> > > > > > >> 1639                 !runstate_check(RUN_STATE_INMIGRATE)) {
> > > > > > >> 1640             runstate_set(RUN_STATE_PRELAUNCH);
> > > > > > >> 1641         }
> > > > > > >> 1642     }
> > > > > > >> ...
> > > > > > >>
> > > > > > >> I already sent a patch for this kind of problem (in current Juan 
> > > > > > >> pull
> > > > > > >> request):
> > > > > > >>
> > > > > > >> "runstate: ignore finishmigrate -> prelaunch transition"
> > > > > > >>
> > > > > > > 
> > > > > > > IIUC your patch avoids an invalid 'prelaunch' -> 'postmigrate' 
> > > > > > > runstate
> > > > > > > transition that can happen if the migration thread sets the 
> > > > > > > runstate to
> > > > > > > 'finishmigrate' when pause_all_vcpus() releases the main loop 
> > > > > > > mutex.
> > > > > > > 
> > > > > > > ie. symptom of the problem is QEMU aborting, correct ? The issue 
> > > > > > > I'm
> > > > > > > trying to fix is a guest breakage caused by a discrepancy between
> > > > > > > QEMU and the guest after migration has succeeded.
> > > > > > > 
> > > > > > >> but I don't know if it could fix this one.
> > > > > > >>
> > > > > > > 
> > > > > > > I don't think so and your patch kinda illustrates it. If the 
> > > > > > > runstate
> > > > > > > is 'finishmigrate' when returning from pause_all_vcpus(), this 
> > > > > > > means
> > > > > > > that state was sent to the destination before we could actually 
> > > > > > > reset
> > > > > > > the machine.
> > > > > > 
> > > > > > Yes, you're right.
> > > > > > 
> > > > > > But the question behind my comment was: is it expected to have a 
> > > > > > pending
> > > > > > reset while we are migrating?
> > > > > > 
> > > > > 
> > > > > Nothing prevents qemu_system_reset_request() to be called when 
> > > > > migration
> > > > > is active. 
> > > > > 
> > > > > > Perhaps H_CAS can return H_BUSY and wait the end of the migration 
> > > > > > and
> > > > > > then be fully executed on destination?
> > > > > > 
> > > > > 
> > > > > And so we would need to teach SLOF to try H_CAS again until it stops
> > > > > returning H_BUSY ? It seems safer to migrate the CAS reboot flag IMHO.
> > > > > 
> > > > 
> > > > Ok I've tried that with a patched SLOF that sleeps 500ms and tries CAS
> > > > again if H_BUSY was returned. It fixes the issue but it looks a bit
> > > > ugly because of the polling with an arbitrary timeout in SLOF... I'm
> > > > not very comfortable either with calling migration_is_active() from
> > > > the CAS code in QEMU.
> > > > 
> > > > David,
> > > > 
> > > > Any suggestion ?
> > > 
> > > Yeah, I think looping in SLOF is a worse idea than migrating the
> > > cas_reboot flag.
> > > 
> > > But.. a better solution still might be to just remove the remaining
> > > causes for CAS reboot entirely.  CAS reboots pretty much suck when
> > > they happen, anyway.
> > > 
> > 
> > I Agree.
> > 
> > > With the irq changeover condition removed, I think the remaining
> > > causes are more theoretical than practical situations at this point.
> > > 
> > 
> > FWIW, hotpluggging a PCI device before CAS result in a hung guest (not yet
> > investigated the details).
> 
> commit 10f12e6450407b18b4d5a6b50d3852dcfd7fff75
> Author: Daniel Henrique Barboza <address@hidden>
> Date:   Wed Aug 30 15:21:41 2017 -0300
> 
>     hw/ppc: CAS reset on early device hotplug
> 
> I'll have a look to see what can be done here.

Ah.. yes, that one might be a bit tricky.

> But I agree the other check is more theoretical:
> 
>     /* capabilities that have been added since CAS-generated guest reset.
>      * if capabilities have since been removed, generate another reset
>      */
>     spapr->cas_reboot = !spapr_ovec_subset(ov5_cas_old, spapr->ov5_cas);
> 
> Unless changing kernels or tempering with the kernel command line, I don't
> see how some capabilities could change between the two CAS in practice.

Well, we want to be robust and it's at least theoretically possible
that the guest will request different things on subsequent reboots.
However I believe that the original rationale for this check was that
while we could add things to the device tree for added capabilities,
we didn't have a way to roll back the changes for removed
capabilities.

Now that we fully rebuild the device tree at CAS, I think this test
can probably just go, although there's some double checking to do.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]