qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC v7 5/9] migration: fix the multifd code when


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH RFC v7 5/9] migration: fix the multifd code when sending less channels
Date: Fri, 2 Nov 2018 16:33:35 +0000
User-agent: Mutt/1.10.1 (2018-07-13)

* Peter Xu (address@hidden) wrote:
> On Fri, Nov 02, 2018 at 11:00:24AM +0800, Fei Li wrote:
> > 
> > 
> > On 11/02/2018 10:37 AM, Peter Xu wrote:
> > > On Thu, Nov 01, 2018 at 06:17:11PM +0800, Fei Li wrote:
> > > > Set the migration state to "failed" instead of "setup" when failing
> > > > to send packet via some channel.
> > > Could you please provide more information in the commit message?
> > > E.g., what will happen if without this patch?  Will it crash the
> > > source or stall the source migration or others?  Otherwise it's a bit
> > > hard for me to understand what's this patch for.
> > Sorry for the inadequate description , I was intended to say that when
> > failing
> > to do the live migration using multifd, e.g. sending less channels, the src
> > status displays "setup" when running `info migrate`. I assume we should tell
> > users that the "Migration status" is "failed" now (and along with the
> > failure reason).
> > 
> > The current src status when failed inmultifd_new_send_channel_async():
> > 
> > 
> > (qemu) migrate_set_capability x-multifd on
> > (qemu) migrate_set_parameter x-multifd-channels 4
> > (qemu) migrate -d tcp:192.168.190.98:4444
> > (qemu) qemu-system-x86_64: failed in multifd_new_send_channel_async due to
> > ...
> > (qemu) info migrate
> > globals:
> > store-global-state: on
> > only-migratable: off
> > send-configuration: on
> > send-section-footer: on
> > decompress-error-check: on
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks:
> > off compress: off events: off postcopy-ram: off x-colo: off release-ram: off
> > block: off return-path: off pause-before-switchover: off x-multifd: on
> > dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off
> > Migration status: setup
> > total time: 0 milliseconds
> 
> Thanks for the information.
> 
> I had a quick look.  For now we do this:
> 
>         multifd_save_setup (without waiting for channels to be ready)
>         create thread migration_thread
>                 (in thread)
>                 ram_save_setup
>                         multifd_send_sync_main (wait for the channels)
> 
> The thing is that we didn't get the notification when one of the
> multifd channel is failed.  IMHO instead of setting the global
> migration state in a per-channel function, we should just report the
> error upwards, then the main thread should decide how to change the
> state machine of the migration.

Best to wait for Juan on that; I've got vague memories that reporting
errors among the threads was a bit tricky.

Dave

> And we have set it in migrate_set_error() after all so the main thread
> should be able to know somehow (though IMHO I'll even prefer to have a
> per-channel variable to keep the state of the channel, then the
> per-channel functions won't touch any globals which offers better
> isolation).
> 
> I'm not sure how Juan thinks about it, but I'd prefer some work to
> provide such isolation and also some mechanism to allow the main
> thread to detect the per-channel errors not only during setup phase
> but also during the migration (e.g., when network is suddenly down).
> Then we don't touch any globals (e.g., we shouldn't call
> migrate_get_current in any per-channel function like
> multifd_new_send_channel_async).
> 
> > 
> > > 
> > > Normally I would prefer to not touch global states in feature specific
> > > code path, but I'd like to know the problem more first...
> > > 
> > > Thanks,
> > > 
> > > > Cc: Peter Xu <address@hidden>
> > > > Signed-off-by: Fei Li <address@hidden>
> > > > ---
> > > >   migration/ram.c | 2 ++
> > > >   1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index 4db3b3e8f4..c84d164fc8 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -1072,6 +1072,7 @@ out:
> > > >   static void multifd_new_send_channel_async(QIOTask *task, gpointer 
> > > > opaque)
> > > >   {
> > > >       MultiFDSendParams *p = opaque;
> > > > +    MigrationState *s = migrate_get_current();
> > > >       QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
> > > >       Error *local_err = NULL;
> > > > @@ -1083,6 +1084,7 @@ static void 
> > > > multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
> > > >           if (multifd_save_cleanup(&local_err) != 0) {
> > > >               migrate_set_error(migrate_get_current(), local_err);
> > > >           }
> > > > +        migrate_set_state(&s->state, s->state, 
> > > > MIGRATION_STATUS_FAILED);
> > > >       } else {
> > > >           p->c = QIO_CHANNEL(sioc);
> > > >           qio_channel_set_delay(p->c, false);
> > > > -- 
> > > > 2.13.7
> > > > 
> > > Regards,
> > > 
> > 
> 
> Regards,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]