qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain fo


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy
Date: Thu, 29 Nov 2018 13:10:14 +0800
User-agent: Mutt/1.10.1 (2018-07-13)

On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:
> On 11/28/2018 05:32 PM, Peter Xu wrote:
> > 
> > So what I am worrying here are corner cases where we might forget to
> > stop the hinting.  I'm fabricating one example sequence of events:
> > 
> >    (start migration)
> >    START_MIGRATION
> >    BEFORE_SYNC
> >    AFTER_SYNC
> >    ...
> >    BEFORE_SYNC
> >    AFTER_SYNC
> >    (some SaveStateEntry failed rather than RAM, then
> >     migration_detect_error returned MIG_THR_ERR_FATAL so we need to
> >     fail the migration, however when running the previous
> >     ram_save_iterate for RAM's specific SaveStateEntry we didn't see
> >     any error so no ERROR event detected)
> > 
> > Then it seems the hinting will last forever.  Considering that now I'm
> > not sure whether this can be done ram-only, since even if you capture
> > ram_save_complete() and at the same time you introduce PRECOPY_END you
> > may still miss the PRECOPY_END event since AFAIU ram_save_complete()
> > won't be called at all in this case.
> > 
> > Could this happen?
> 
> Thanks, indeed this case could happen if we add PRECOPY_END in
> ram_save_complete.
> 
> How about putting PRECOPY_END in ram_save_cleanup?
> I think it would be called in any case.

Sounds good.

> 
> I'm also thinking probably we don't need PRECOPY_ERR when we have
> PRECOPY_END,
> and what do you think of the notifier names below:
> 
> +typedef enum PrecopyNotifyReason {
> +    PRECOPY_NOTIFY_RAM_SAVE_END = 0,
> +    PRECOPY_NOTIFY_RAM_SAVE_START = 1,
> +    PRECOPY_NOTIFY_RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
> +    PRECOPY_NOTIFY_RAM_SAVE_AFTER_SYNC_BITMAP = 3,
> +    PRECOPY_NOTIFY_RAM_SAVE_MAX = 4,
> +} PrecopyNotifyReason;

(please see below [1]...)

> 
> 
> > 
> > > 
> > > > [1]
> > > > 
> > > > > > Another thing to mention about the "reasons" (though I see it more
> > > > > > like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
> > > > > > It might help in some cases:
> > > > > > 
> > > > > >      - then you don't need to trickily export the migrate_postcopy()
> > > > > >        since you'll notify that before postcopy starts
> > > > > I'm thinking probably we don't need to export migrate_postcopy even 
> > > > > now.
> > > > > It's more like a sanity check, and not needed because now we have the
> > > > > notifier registered to the precopy specific callchain, which has 
> > > > > ensured
> > > > > that
> > > > > it is invoked via precopy.
> > > > But postcopy will always start with precopy, no?
> > > Yes, but I think we could add the check in precopy_notify()
> > I'm not sure that's good.  If the notifier could potentially have
> > other user, they might still work with postcopy, and they might expect
> > e.g. BEFORE_SYNC to be called for every sync, even if it's at the
> > precopy stage of a postcopy.
> 
> I think this precopy notifier callchain is expected to be used only for
> the precopy mode. Postcopy has its dedicated notifier callchain that
> users could use.
> 
> How about changing the migrate_postcopy() check to "ms->start_postcopy":
> 
> bool migration_postcopy_start(void)
> {
>     MigrationState *s;
> 
>     s = migrate_get_current();
> 
>     return atomic_read(&s->start_postcopy);
> }
> 
> 
> static void precopy_notify(PrecopyNotifyReason reason)
> {
>     if (migration_postcopy_start())
>         return;
> 
>     notifier_list_notify(&precopy_notifier_list, &reason);
> }
> 
> If postcopy started with precopy, the precopy optimization feature
> could still be used until it switches to the postcopy mode.

I'm not sure we can use start_postcopy.  It's a variable being set in
the QMP handler but it does not mean postcopy has started.  I'm afraid
there can be race where it's still precopy but the variable is set so
event could be missed...

IMHO the problem is not that complicated.  How about this proposal:

[1]

  typedef enum PrecopyNotifyReason {
    PRECOPY_NOTIFY_RAM_START,
    PRECOPY_NOTIFY_RAM_BEFORE_SYNC,
    PRECOPY_NOTIFY_RAM_AFTER_SYNC,
    PRECOPY_NOTIFY_COMPLETE,
    PRECOPY_NOTIFY_RAM_CLEANUP,
  };

The first three keep the same as your old ones.  Notify RAM_CLEANUP in
ram_save_cleanup() to make sure it'll always be cleaned up (the same
as PRECOPY_END, just another name).  Notify COMPLETE in
qemu_savevm_state_complete_precopy() to show that precopy is
completed.  Meanwhile on balloon side you should stop the hinting for
either RAM_CLEANUP or COMPLETE event.  Then either:

  - precopy is switching to postcopy, or
  - precopy completed, or
  - precopy failed/cancelled

You should always get at least a notification to stop the balloon.
Though you could also get one RAM_CLEANUP after one COMPLETE, but
the balloon should easily handle it (stop the hinting twice).

Here maybe you can even remove the "RAM_" in both RAM_START and
RAM_CLEANUP if we're going to have COMPLETE since after all it'll be
not only limited to RAM.

Another suggestion is that you can add an Error into the notify hooks,
please refer to the postcopy one:

  int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);

So the hook functions have a way to even stop the migration (though
for balloon hinting it'll be always optional so no error should be
reported...), then the two interfaces are matched.

> 
> 
> 
> > In that sense I still feel the
> > PRECOPY_END is better (so contantly call it at the end of precopy, no
> > matter whether there's another postcopy afterwards).  It sounds like a
> > cleaner interface.
> 
> Probably I still haven't got the point how PRECOPY_END could help above yet.

Please have a look at above proposal.  Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]