qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/2] migration/channel errors and cancelling


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH 0/2] migration/channel errors and cancelling
Date: Tue, 19 Dec 2017 19:21:31 +0800
User-agent: Mutt/1.9.1 (2017-09-22)

On Tue, Dec 19, 2017 at 10:14:08AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (address@hidden) wrote:
> > On Fri, Dec 15, 2017 at 05:16:53PM +0000, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" <address@hidden>
> > > 
> > > Hi,
> > >   Where a channel fails asynchronously during connect, call
> > > back through the migration code so it can clean up.
> > >   In particular this causes the transition of a 'cancelling' state
> > > to 'cancelled' in the case of:
> > > 
> > >     migrate -d tcp:deadhost:port
> > >      <host tries to connect>
> > >     migrate_cancel
> > > 
> > > previously the status would get stuck in cancelling because
> > > the final cleanup didn't happen.
> > > 
> > >   This is the second part of the fix for:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1525899
> > 
> > IIUC this series tries to deliver the connection error a long way
> > until migrate_fd_connect() to handle it. But, haven't we already have
> > a function migrate_fd_error() to do that (which is faster, and
> > simpler)?
> > 
> > void migrate_fd_error(MigrationState *s, const Error *error)
> > {
> >     trace_migrate_fd_error(error_get_pretty(error));
> >     assert(s->to_dst_file == NULL);
> >     migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> >                       MIGRATION_STATUS_FAILED);
> >     migrate_set_error(s, error);
> >     notifier_list_notify(&migration_state_notifiers, s);
> >     block_cleanup_parameters(s);
> > }
> > 
> > I think it's not handling the case when cancelling.  If we let it to
> > handle the cancelling case well, would it be a simpler fix?
> >
> > Moreover, I think this is another good example that migration is not
> > handling the cleanup "cleanly" in general... I really hope we can do
> > this better in 2.12.  I'll see whether I can give it a shot, but in
> > all cases it'll be after the merging of existing patches since there
> > are already quite a lot of dangling patches.
> 
> No, I think migrate_fd_error is the cause of the problem here, not the
> answer.

Could I ask why migrate_fd_error() is problematic?  Yeah I agree that
we should have a single point to clean things up, then can we call
migrate_fd_cleanup() somehow inside migrate_fd_error()?

The thing I don't really understand is: why we want the error be
delivered via those functions (migration_channel_connect,
migrate_fd_connect, etc.) to finally be cleaned up.

> 
> If we stick to the simple rule that a migration must always call
> migrate_fd_cleanup then the cancellation problems are fixed - I think
> that's how we make migration 'clean' - a single cleanup routine
> that always gets called.

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]