[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain fo
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy |
Date: |
Thu, 29 Nov 2018 13:47:22 +0800 |
User-agent: |
Mutt/1.10.1 (2018-07-13) |
On Thu, Nov 29, 2018 at 01:10:14PM +0800, Peter Xu wrote:
> On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:
> > On 11/28/2018 05:32 PM, Peter Xu wrote:
> > >
> > > So what I am worrying here are corner cases where we might forget to
> > > stop the hinting. I'm fabricating one example sequence of events:
> > >
> > > (start migration)
> > > START_MIGRATION
> > > BEFORE_SYNC
> > > AFTER_SYNC
> > > ...
> > > BEFORE_SYNC
> > > AFTER_SYNC
> > > (some SaveStateEntry failed rather than RAM, then
> > > migration_detect_error returned MIG_THR_ERR_FATAL so we need to
> > > fail the migration, however when running the previous
> > > ram_save_iterate for RAM's specific SaveStateEntry we didn't see
> > > any error so no ERROR event detected)
> > >
> > > Then it seems the hinting will last forever. Considering that now I'm
> > > not sure whether this can be done ram-only, since even if you capture
> > > ram_save_complete() and at the same time you introduce PRECOPY_END you
> > > may still miss the PRECOPY_END event since AFAIU ram_save_complete()
> > > won't be called at all in this case.
> > >
> > > Could this happen?
> >
> > Thanks, indeed this case could happen if we add PRECOPY_END in
> > ram_save_complete.
> >
> > How about putting PRECOPY_END in ram_save_cleanup?
> > I think it would be called in any case.
>
> Sounds good.
>
> >
> > I'm also thinking probably we don't need PRECOPY_ERR when we have
> > PRECOPY_END,
> > and what do you think of the notifier names below:
> >
> > +typedef enum PrecopyNotifyReason {
> > + PRECOPY_NOTIFY_RAM_SAVE_END = 0,
> > + PRECOPY_NOTIFY_RAM_SAVE_START = 1,
> > + PRECOPY_NOTIFY_RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
> > + PRECOPY_NOTIFY_RAM_SAVE_AFTER_SYNC_BITMAP = 3,
> > + PRECOPY_NOTIFY_RAM_SAVE_MAX = 4,
> > +} PrecopyNotifyReason;
>
> (please see below [1]...)
>
> >
> >
> > >
> > > >
> > > > > [1]
> > > > >
> > > > > > > Another thing to mention about the "reasons" (though I see it more
> > > > > > > like "events"): have you thought about adding a
> > > > > > > PRECOPY_NOTIFY_END?
> > > > > > > It might help in some cases:
> > > > > > >
> > > > > > > - then you don't need to trickily export the
> > > > > > > migrate_postcopy()
> > > > > > > since you'll notify that before postcopy starts
> > > > > > I'm thinking probably we don't need to export migrate_postcopy even
> > > > > > now.
> > > > > > It's more like a sanity check, and not needed because now we have
> > > > > > the
> > > > > > notifier registered to the precopy specific callchain, which has
> > > > > > ensured
> > > > > > that
> > > > > > it is invoked via precopy.
> > > > > But postcopy will always start with precopy, no?
> > > > Yes, but I think we could add the check in precopy_notify()
> > > I'm not sure that's good. If the notifier could potentially have
> > > other user, they might still work with postcopy, and they might expect
> > > e.g. BEFORE_SYNC to be called for every sync, even if it's at the
> > > precopy stage of a postcopy.
> >
> > I think this precopy notifier callchain is expected to be used only for
> > the precopy mode. Postcopy has its dedicated notifier callchain that
> > users could use.
> >
> > How about changing the migrate_postcopy() check to "ms->start_postcopy":
> >
> > bool migration_postcopy_start(void)
> > {
> > MigrationState *s;
> >
> > s = migrate_get_current();
> >
> > return atomic_read(&s->start_postcopy);
> > }
> >
> >
> > static void precopy_notify(PrecopyNotifyReason reason)
> > {
> > if (migration_postcopy_start())
> > return;
> >
> > notifier_list_notify(&precopy_notifier_list, &reason);
> > }
> >
> > If postcopy started with precopy, the precopy optimization feature
> > could still be used until it switches to the postcopy mode.
>
> I'm not sure we can use start_postcopy. It's a variable being set in
> the QMP handler but it does not mean postcopy has started. I'm afraid
> there can be race where it's still precopy but the variable is set so
> event could be missed...
>
> IMHO the problem is not that complicated. How about this proposal:
>
> [1]
>
> typedef enum PrecopyNotifyReason {
> PRECOPY_NOTIFY_RAM_START,
> PRECOPY_NOTIFY_RAM_BEFORE_SYNC,
> PRECOPY_NOTIFY_RAM_AFTER_SYNC,
> PRECOPY_NOTIFY_COMPLETE,
> PRECOPY_NOTIFY_RAM_CLEANUP,
> };
>
> The first three keep the same as your old ones. Notify RAM_CLEANUP in
> ram_save_cleanup() to make sure it'll always be cleaned up (the same
> as PRECOPY_END, just another name). Notify COMPLETE in
> qemu_savevm_state_complete_precopy() to show that precopy is
> completed. Meanwhile on balloon side you should stop the hinting for
> either RAM_CLEANUP or COMPLETE event. Then either:
>
> - precopy is switching to postcopy, or
> - precopy completed, or
> - precopy failed/cancelled
>
> You should always get at least a notification to stop the balloon.
> Though you could also get one RAM_CLEANUP after one COMPLETE, but
> the balloon should easily handle it (stop the hinting twice).
>
> Here maybe you can even remove the "RAM_" in both RAM_START and
> RAM_CLEANUP if we're going to have COMPLETE since after all it'll be
> not only limited to RAM.
Oh maybe we can remove all the RAM_ prefix to make it precopy
general...
typedef enum PrecopyNotifyReason {
PRECOPY_NOTIFY_SETUP,
PRECOPY_NOTIFY_BEFORE_SYNC,
PRECOPY_NOTIFY_AFTER_SYNC,
PRECOPY_NOTIFY_COMPLETE,
PRECOPY_NOTIFY_CLEANUP,
};
Then we can just hook everything with the corresponding names:
SETUP: hooks with qemu_savevm_state_setup
COMPLETE: hooks with qemu_savevm_state_complete_precopy
CLEANUP: hooks with qemu_savevm_state_cleanup
I'm not sure whether you'll need another hook in ram_state_reset in
the future but for now I don't see it necessary since I don't thnk
ram_list.version would change during migration for now, so
ram_state_reset should only be called during setup.
>
> Another suggestion is that you can add an Error into the notify hooks,
> please refer to the postcopy one:
>
> int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
>
> So the hook functions have a way to even stop the migration (though
> for balloon hinting it'll be always optional so no error should be
> reported...), then the two interfaces are matched.
>
> >
> >
> >
> > > In that sense I still feel the
> > > PRECOPY_END is better (so contantly call it at the end of precopy, no
> > > matter whether there's another postcopy afterwards). It sounds like a
> > > cleaner interface.
> >
> > Probably I still haven't got the point how PRECOPY_END could help above yet.
>
> Please have a look at above proposal. Thanks,
>
> --
> Peter Xu
>
Regards,
--
Peter Xu
- Re: [Qemu-devel] [virtio-dev] Re: [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty, (continued)
[Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Wei Wang, 2018/11/15
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Peter Xu, 2018/11/27
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Wei Wang, 2018/11/27
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Peter Xu, 2018/11/28
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Wei Wang, 2018/11/28
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Peter Xu, 2018/11/28
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Wei Wang, 2018/11/28
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Peter Xu, 2018/11/29
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy,
Peter Xu <=
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Wei Wang, 2018/11/29
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Wei Wang, 2018/11/30
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Peter Xu, 2018/11/30
- Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy, Wei Wang, 2018/11/30
[Qemu-devel] [PATCH v9 6/8] migration/ram.c: add a function to disable the bulk stage, Wei Wang, 2018/11/15
[Qemu-devel] [PATCH v9 7/8] migration: move migrate_postcopy() to include/migration/misc.h, Wei Wang, 2018/11/15
[Qemu-devel] [PATCH v9 8/8] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT, Wei Wang, 2018/11/15
Re: [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support, no-reply, 2018/11/15
Re: [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support, Wei Wang, 2018/11/26