qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
Date: Fri, 4 Aug 2017 09:30:01 +0100
User-agent: Mutt/1.8.3 (2017-05-23)

* Peter Xu (address@hidden) wrote:
> On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> > 
> > [...]
> > 
> > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > > +{
> > > > +    /*
> > > > +     * This means source VM is ready to resume the postcopy migration.
> > > > +     * It's time to switch state and release the fault thread to
> > > > +     * continue service page faults.
> > > > +     */
> > > > +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > > +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > > +    qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > > 
> > > Is it worth sanity checking that you were in RECOVER at this point?
> > 
> > Yeah, it never hurts.  Will do.
> 
> Not sure whether this would be good (note: I returned 0 in the if):
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b7843c2..b34f59b 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1709,6 +1709,12 @@ static int 
> loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>  
>  static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
>  {
> +    if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> +        error_report("%s: illegal resume received", __func__);
> +        /* Don't fail the load, only for this. */
> +        return 0;
> +    }
> +
>      /*
>       * This means source VM is ready to resume the postcopy migration.
>       * It's time to switch state and release the fault thread to
> 
> Basically I just don't want to crash the dest VM (it holds hot dirty
> pages) even if it receives a faulty RESUME command.

Yes, so now that's a fun problem; effectively you then have 3 valid
failure modes:
    a) An IO failure so we need to go into POSTCOPY_PAUSE
    b) A fatal migration stream problem to quit
    c) A non-fatal migration stream problem to go .. back into PAUSE?

Dave

> -- 
> Peter Xu
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]