[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
From: |
Peter Xu |
Subject: |
Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source |
Date: |
Wed, 2 Aug 2023 12:19:32 -0400 |
On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> This function currently has a straight-forward part which is waiting
> for the thread to join and a complicated part which is doing a
> qemu_file_shutdown() on the return path file.
>
> The shutdown is tricky because all calls to qemu_file_shutdown() set
> f->last_error to -EIO, which means we can never know if an error is an
> actual error or if we cleanly shutdown the file previously.
>
> This is particularly bothersome for postcopy because it would send the
> return path thread into the retry routine which would wait on the
> postcopy_pause_rp_sem and consequently block the main thread. We
> haven't had reports of this so I must presume we never reach here with
> postcopy.
>
> The shutdown call is also racy because since it doesn't take the
> qemu_file_lock, it could NULL-dereference if the return path thread
> happens to be in the middle of the critical region at
> migration_release_dst_files().
After you rework the thread model on resume, shall we move
migration_release_dst_files() into the migration thread to be after the
pthread_join()? I assume then we don't even need a mutex to protect it?
>
> Move this more complicated part of the code to a separate routine so
> we can wait on the thread without all of this baggage.
I think you mentioned "some nuance" on having mark_source_rp_bad() in
await_return_path_close_on_source(), I did remember I tried to look into
that "nuance" too a long time ago but I just forgot what was that. Great
if you can share some details.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/migration.c | 46 +++++++++++++++++++++++++------------------
> 1 file changed, 27 insertions(+), 19 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 91bba630a8..58f09275a8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2038,6 +2038,25 @@ static int open_return_path_on_source(MigrationState
> *ms,
> /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
> static int await_return_path_close_on_source(MigrationState *ms)
> {
> + if (!ms->rp_state.rp_thread_created) {
> + return 0;
> + }
> +
> + trace_await_return_path_close_on_source_joining();
> + qemu_thread_join(&ms->rp_state.rp_thread);
> + ms->rp_state.rp_thread_created = false;
> + trace_await_return_path_close_on_source_close();
> + return ms->rp_state.error;
> +}
> +
> +static int close_return_path_on_source(MigrationState *ms)
> +{
> + int ret;
> +
> + if (!ms->rp_state.rp_thread_created) {
> + return 0;
> + }
Can we still rely on the await_return_path_close_on_source() check, so as
to dedup this one?
> +
> /*
> * If this is a normal exit then the destination will send a SHUT and the
> * rp_thread will exit, however if there's an error we need to cause
> @@ -2051,11 +2070,12 @@ static int
> await_return_path_close_on_source(MigrationState *ms)
> qemu_file_shutdown(ms->rp_state.from_dst_file);
> mark_source_rp_bad(ms);
> }
> - trace_await_return_path_close_on_source_joining();
> - qemu_thread_join(&ms->rp_state.rp_thread);
> - ms->rp_state.rp_thread_created = false;
> - trace_await_return_path_close_on_source_close();
> - return ms->rp_state.error;
> +
> + trace_migration_return_path_end_before();
> + ret = await_return_path_close_on_source(ms);
> + trace_migration_return_path_end_after(ret);
> +
> + return ret;
> }
>
> static inline void
> @@ -2351,20 +2371,8 @@ static void migration_completion(MigrationState *s)
> goto fail;
> }
>
> - /*
> - * If rp was opened we must clean up the thread before
> - * cleaning everything else up (since if there are no failures
> - * it will wait for the destination to send it's status in
> - * a SHUT command).
> - */
> - if (s->rp_state.rp_thread_created) {
> - int rp_error;
> - trace_migration_return_path_end_before();
> - rp_error = await_return_path_close_on_source(s);
> - trace_migration_return_path_end_after(rp_error);
> - if (rp_error) {
> - goto fail;
> - }
> + if (close_return_path_on_source(s)) {
> + goto fail;
> }
>
> if (qemu_file_get_error(s->to_dst_file)) {
> --
> 2.35.3
>
--
Peter Xu
- [PATCH v2 0/2] Fix segfault on migration return path, Fabiano Rosas, 2023/08/02
- [PATCH v2 1/2] migration: Split await_return_path_close_on_source, Fabiano Rosas, 2023/08/02
- Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source,
Peter Xu <=
- Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source, Fabiano Rosas, 2023/08/02
- Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source, Peter Xu, 2023/08/02
- Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source, Fabiano Rosas, 2023/08/03
- Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source, Peter Xu, 2023/08/03
- Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source, Daniel P . Berrangé, 2023/08/03
- Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source, Peter Xu, 2023/08/03
[PATCH v2 2/2] migration: Replace the return path retry logic, Fabiano Rosas, 2023/08/02