qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain fo


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy
Date: Thu, 29 Nov 2018 13:47:22 +0800
User-agent: Mutt/1.10.1 (2018-07-13)

On Thu, Nov 29, 2018 at 01:10:14PM +0800, Peter Xu wrote:
> On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:
> > On 11/28/2018 05:32 PM, Peter Xu wrote:
> > > 
> > > So what I am worrying here are corner cases where we might forget to
> > > stop the hinting.  I'm fabricating one example sequence of events:
> > > 
> > >    (start migration)
> > >    START_MIGRATION
> > >    BEFORE_SYNC
> > >    AFTER_SYNC
> > >    ...
> > >    BEFORE_SYNC
> > >    AFTER_SYNC
> > >    (some SaveStateEntry failed rather than RAM, then
> > >     migration_detect_error returned MIG_THR_ERR_FATAL so we need to
> > >     fail the migration, however when running the previous
> > >     ram_save_iterate for RAM's specific SaveStateEntry we didn't see
> > >     any error so no ERROR event detected)
> > > 
> > > Then it seems the hinting will last forever.  Considering that now I'm
> > > not sure whether this can be done ram-only, since even if you capture
> > > ram_save_complete() and at the same time you introduce PRECOPY_END you
> > > may still miss the PRECOPY_END event since AFAIU ram_save_complete()
> > > won't be called at all in this case.
> > > 
> > > Could this happen?
> > 
> > Thanks, indeed this case could happen if we add PRECOPY_END in
> > ram_save_complete.
> > 
> > How about putting PRECOPY_END in ram_save_cleanup?
> > I think it would be called in any case.
> 
> Sounds good.
> 
> > 
> > I'm also thinking probably we don't need PRECOPY_ERR when we have
> > PRECOPY_END,
> > and what do you think of the notifier names below:
> > 
> > +typedef enum PrecopyNotifyReason {
> > +    PRECOPY_NOTIFY_RAM_SAVE_END = 0,
> > +    PRECOPY_NOTIFY_RAM_SAVE_START = 1,
> > +    PRECOPY_NOTIFY_RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
> > +    PRECOPY_NOTIFY_RAM_SAVE_AFTER_SYNC_BITMAP = 3,
> > +    PRECOPY_NOTIFY_RAM_SAVE_MAX = 4,
> > +} PrecopyNotifyReason;
> 
> (please see below [1]...)
> 
> > 
> > 
> > > 
> > > > 
> > > > > [1]
> > > > > 
> > > > > > > Another thing to mention about the "reasons" (though I see it more
> > > > > > > like "events"): have you thought about adding a 
> > > > > > > PRECOPY_NOTIFY_END?
> > > > > > > It might help in some cases:
> > > > > > > 
> > > > > > >      - then you don't need to trickily export the 
> > > > > > > migrate_postcopy()
> > > > > > >        since you'll notify that before postcopy starts
> > > > > > I'm thinking probably we don't need to export migrate_postcopy even 
> > > > > > now.
> > > > > > It's more like a sanity check, and not needed because now we have 
> > > > > > the
> > > > > > notifier registered to the precopy specific callchain, which has 
> > > > > > ensured
> > > > > > that
> > > > > > it is invoked via precopy.
> > > > > But postcopy will always start with precopy, no?
> > > > Yes, but I think we could add the check in precopy_notify()
> > > I'm not sure that's good.  If the notifier could potentially have
> > > other user, they might still work with postcopy, and they might expect
> > > e.g. BEFORE_SYNC to be called for every sync, even if it's at the
> > > precopy stage of a postcopy.
> > 
> > I think this precopy notifier callchain is expected to be used only for
> > the precopy mode. Postcopy has its dedicated notifier callchain that
> > users could use.
> > 
> > How about changing the migrate_postcopy() check to "ms->start_postcopy":
> > 
> > bool migration_postcopy_start(void)
> > {
> >     MigrationState *s;
> > 
> >     s = migrate_get_current();
> > 
> >     return atomic_read(&s->start_postcopy);
> > }
> > 
> > 
> > static void precopy_notify(PrecopyNotifyReason reason)
> > {
> >     if (migration_postcopy_start())
> >         return;
> > 
> >     notifier_list_notify(&precopy_notifier_list, &reason);
> > }
> > 
> > If postcopy started with precopy, the precopy optimization feature
> > could still be used until it switches to the postcopy mode.
> 
> I'm not sure we can use start_postcopy.  It's a variable being set in
> the QMP handler but it does not mean postcopy has started.  I'm afraid
> there can be race where it's still precopy but the variable is set so
> event could be missed...
> 
> IMHO the problem is not that complicated.  How about this proposal:
> 
> [1]
> 
>   typedef enum PrecopyNotifyReason {
>     PRECOPY_NOTIFY_RAM_START,
>     PRECOPY_NOTIFY_RAM_BEFORE_SYNC,
>     PRECOPY_NOTIFY_RAM_AFTER_SYNC,
>     PRECOPY_NOTIFY_COMPLETE,
>     PRECOPY_NOTIFY_RAM_CLEANUP,
>   };
> 
> The first three keep the same as your old ones.  Notify RAM_CLEANUP in
> ram_save_cleanup() to make sure it'll always be cleaned up (the same
> as PRECOPY_END, just another name).  Notify COMPLETE in
> qemu_savevm_state_complete_precopy() to show that precopy is
> completed.  Meanwhile on balloon side you should stop the hinting for
> either RAM_CLEANUP or COMPLETE event.  Then either:
> 
>   - precopy is switching to postcopy, or
>   - precopy completed, or
>   - precopy failed/cancelled
> 
> You should always get at least a notification to stop the balloon.
> Though you could also get one RAM_CLEANUP after one COMPLETE, but
> the balloon should easily handle it (stop the hinting twice).
> 
> Here maybe you can even remove the "RAM_" in both RAM_START and
> RAM_CLEANUP if we're going to have COMPLETE since after all it'll be
> not only limited to RAM.

Oh maybe we can remove all the RAM_ prefix to make it precopy
general...

  typedef enum PrecopyNotifyReason {
    PRECOPY_NOTIFY_SETUP,
    PRECOPY_NOTIFY_BEFORE_SYNC,
    PRECOPY_NOTIFY_AFTER_SYNC,
    PRECOPY_NOTIFY_COMPLETE,
    PRECOPY_NOTIFY_CLEANUP,
  };

Then we can just hook everything with the corresponding names:

  SETUP:    hooks with qemu_savevm_state_setup
  COMPLETE: hooks with qemu_savevm_state_complete_precopy
  CLEANUP:  hooks with qemu_savevm_state_cleanup

I'm not sure whether you'll need another hook in ram_state_reset in
the future but for now I don't see it necessary since I don't thnk
ram_list.version would change during migration for now, so
ram_state_reset should only be called during setup.

> 
> Another suggestion is that you can add an Error into the notify hooks,
> please refer to the postcopy one:
> 
>   int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
> 
> So the hook functions have a way to even stop the migration (though
> for balloon hinting it'll be always optional so no error should be
> reported...), then the two interfaces are matched.
> 
> > 
> > 
> > 
> > > In that sense I still feel the
> > > PRECOPY_END is better (so contantly call it at the end of precopy, no
> > > matter whether there's another postcopy afterwards).  It sounds like a
> > > cleaner interface.
> > 
> > Probably I still haven't got the point how PRECOPY_END could help above yet.
> 
> Please have a look at above proposal.  Thanks,
> 
> -- 
> Peter Xu
> 

Regards,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]