[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME |
Date: |
Fri, 4 Aug 2017 09:30:01 +0100 |
User-agent: |
Mutt/1.8.3 (2017-05-23) |
* Peter Xu (address@hidden) wrote:
> On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote:
> > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote:
> >
> > [...]
> >
> > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> > > > +{
> > > > + /*
> > > > + * This means source VM is ready to resume the postcopy migration.
> > > > + * It's time to switch state and release the fault thread to
> > > > + * continue service page faults.
> > > > + */
> > > > + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER,
> > > > + MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > > + qemu_sem_post(&mis->postcopy_pause_sem_fault);
> > >
> > > Is it worth sanity checking that you were in RECOVER at this point?
> >
> > Yeah, it never hurts. Will do.
>
> Not sure whether this would be good (note: I returned 0 in the if):
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b7843c2..b34f59b 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1709,6 +1709,12 @@ static int
> loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>
> static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> {
> + if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) {
> + error_report("%s: illegal resume received", __func__);
> + /* Don't fail the load, only for this. */
> + return 0;
> + }
> +
> /*
> * This means source VM is ready to resume the postcopy migration.
> * It's time to switch state and release the fault thread to
>
> Basically I just don't want to crash the dest VM (it holds hot dirty
> pages) even if it receives a faulty RESUME command.
Yes, so now that's a fun problem; effectively you then have 3 valid
failure modes:
a) An IO failure so we need to go into POSTCOPY_PAUSE
b) A fatal migration stream problem to quit
c) A non-fatal migration stream problem to go .. back into PAUSE?
Dave
> --
> Peter Xu
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK