Re: [Qemu-devel] [PATCH] migration: regain control of images when migrat

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] migration: regain control of images when migrat

From:	Greg Kurz
Subject:	Re: [Qemu-devel] [PATCH] migration: regain control of images when migration fails to complete
Date:	Wed, 18 May 2016 14:18:38 +0200

On Wed, 18 May 2016 13:42:08 +0200
Kevin Wolf <address@hidden> wrote:

> Am 18.05.2016 um 13:13 hat Dr. David Alan Gilbert geschrieben:
> > * Greg Kurz (address@hidden) wrote:  
> > > We currently have an error path during migration that can cause
> > > the source QEMU to abort:  
> > 
> > Hmm, wasn't there something similar recently, sorry I can't remember the 
> > details, but
> > cc'ing some block people who might remember.  
> 
> Not sure, but in any case calling bdrv_invalidate_cache_all() in the
> error case sounds right to me.
> 
> I think I would have tried to write the patch a bit differently so that
> it uses the normal error handling patterns, but that doesn't make it
> wrong. Specifically, checking the error condition twice in two different
> places seems strange; normally you check it once and then goto fail,
> where the cleanup is done (which could include a call to
> bdrv_invalidate_cache_all()).
> 

I have coded it this way because it is obvious we're undoing the
call to bdrv_inactivate_all() a few lines above. And also, in the
postcopy case, migration_thread() doesn't call vm_start() when
migration fails to complete: bdrv_invalidate_cache_all() will be
called if the guest gets restarted via QMP.

This being said, I agree that the resulting code looks weird. I guess
it is possible to add a fail_invalidate: label to do the cleanup in
the non postcopy case.

I'll send a v2.

> Kevin
> 
> > Dave  
> > > 
> > > migration_thread()
> > >   migration_completion()
> > >     runstate_is_running() ----------------> true if guest is running
> > >     bdrv_inactivate_all() ----------------> inactivate images
> > >     qemu_savevm_state_complete_precopy()
> > >      ... qemu_fflush()
> > >            socket_writev_buffer() --------> error because destination 
> > > fails
> > >          qemu_fflush() -------------------> set error on migration stream
> > >   migration_completion() -----------------> set migrate state to FAILED
> > > migration_thread() -----------------------> break migration loop
> > >   vm_start() -----------------------------> restart guest with inactive
> > >                                             images
> > > 
> > > and you get:
> > > 
> > > qemu-system-ppc64: socket_writev_buffer: Got err=104 for 
> > > (32768/18446744073709551615)
> > > qemu-system-ppc64: 
> > > /home/greg/Work/qemu/qemu-master/block/io.c:1342:bdrv_co_do_pwritev: 
> > > Assertion `!(bs->open_flags & 0x0800)' failed.
> > > Aborted (core dumped)
> > > 
> > > If we try postcopy with a similar scenario, we also get the writev error
> > > message but QEMU leaves the guest paused because entered_postcopy is true.
> > > 
> > > We could possibly do the same with precopy and leave the guest paused.
> > > But since the historical default for migration errors is to restart the
> > > source, this patch adds a call to bdrv_invalidate_cache_all() instead.
> > > 
> > > Signed-off-by: Greg Kurz <address@hidden>
> > > ---
> > >  migration/migration.c |    9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 991313a8629a..5726959ddfd9 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1568,8 +1568,17 @@ static void migration_completion(MigrationState 
> > > *s, int current_active_state,
> > >                  ret = bdrv_inactivate_all();
> > >              }
> > >              if (ret >= 0) {
> > > +                Error *local_err = NULL;
> > > +
> > >                  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
> > >                  qemu_savevm_state_complete_precopy(s->to_dst_file, 
> > > false);
> > > +
> > > +                if (qemu_file_get_error(s->to_dst_file)) {
> > > +                    bdrv_invalidate_cache_all(&local_err);
> > > +                    if (local_err) {
> > > +                        error_report_err(local_err);
> > > +                    }
> > > +                }
> > >              }
> > >          }
> > >          qemu_mutex_unlock_iothread();
> > >   
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK  
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH] migration: regain control of images when migration fails to complete, Greg Kurz, 2016/05/18
- Re: [Qemu-devel] [PATCH] migration: regain control of images when migration fails to complete, Dr. David Alan Gilbert, 2016/05/18
  - Re: [Qemu-devel] [PATCH] migration: regain control of images when migration fails to complete, Kevin Wolf, 2016/05/18
    - Re: [Qemu-devel] [PATCH] migration: regain control of images when migration fails to complete, Greg Kurz <=

Prev by Date: Re: [Qemu-devel] [PATCH] pci_register_bar: cleanup
Next by Date: Re: [Qemu-devel] [PATCH] pci_register_bar: cleanup
Previous by thread: Re: [Qemu-devel] [PATCH] migration: regain control of images when migration fails to complete
Next by thread: [Qemu-devel] [PATCH v3 0/2] trace: Add event for vCPU memory accesses
Index(es):
- Date
- Thread