qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source


From: Peter Xu
Subject: Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
Date: Wed, 2 Aug 2023 12:19:32 -0400

On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> This function currently has a straight-forward part which is waiting
> for the thread to join and a complicated part which is doing a
> qemu_file_shutdown() on the return path file.
> 
> The shutdown is tricky because all calls to qemu_file_shutdown() set
> f->last_error to -EIO, which means we can never know if an error is an
> actual error or if we cleanly shutdown the file previously.
> 
> This is particularly bothersome for postcopy because it would send the
> return path thread into the retry routine which would wait on the
> postcopy_pause_rp_sem and consequently block the main thread. We
> haven't had reports of this so I must presume we never reach here with
> postcopy.
> 
> The shutdown call is also racy because since it doesn't take the
> qemu_file_lock, it could NULL-dereference if the return path thread
> happens to be in the middle of the critical region at
> migration_release_dst_files().

After you rework the thread model on resume, shall we move
migration_release_dst_files() into the migration thread to be after the
pthread_join()?  I assume then we don't even need a mutex to protect it?

> 
> Move this more complicated part of the code to a separate routine so
> we can wait on the thread without all of this baggage.

I think you mentioned "some nuance" on having mark_source_rp_bad() in
await_return_path_close_on_source(), I did remember I tried to look into
that "nuance" too a long time ago but I just forgot what was that.  Great
if you can share some details.

> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/migration.c | 46 +++++++++++++++++++++++++------------------
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 91bba630a8..58f09275a8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2038,6 +2038,25 @@ static int open_return_path_on_source(MigrationState 
> *ms,
>  /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
>  static int await_return_path_close_on_source(MigrationState *ms)
>  {
> +    if (!ms->rp_state.rp_thread_created) {
> +        return 0;
> +    }
> +
> +    trace_await_return_path_close_on_source_joining();
> +    qemu_thread_join(&ms->rp_state.rp_thread);
> +    ms->rp_state.rp_thread_created = false;
> +    trace_await_return_path_close_on_source_close();
> +    return ms->rp_state.error;
> +}
> +
> +static int close_return_path_on_source(MigrationState *ms)
> +{
> +    int ret;
> +
> +    if (!ms->rp_state.rp_thread_created) {
> +        return 0;
> +    }

Can we still rely on the await_return_path_close_on_source() check, so as
to dedup this one?

> +
>      /*
>       * If this is a normal exit then the destination will send a SHUT and the
>       * rp_thread will exit, however if there's an error we need to cause
> @@ -2051,11 +2070,12 @@ static int 
> await_return_path_close_on_source(MigrationState *ms)
>          qemu_file_shutdown(ms->rp_state.from_dst_file);
>          mark_source_rp_bad(ms);
>      }
> -    trace_await_return_path_close_on_source_joining();
> -    qemu_thread_join(&ms->rp_state.rp_thread);
> -    ms->rp_state.rp_thread_created = false;
> -    trace_await_return_path_close_on_source_close();
> -    return ms->rp_state.error;
> +
> +    trace_migration_return_path_end_before();
> +    ret = await_return_path_close_on_source(ms);
> +    trace_migration_return_path_end_after(ret);
> +
> +    return ret;
>  }
>  
>  static inline void
> @@ -2351,20 +2371,8 @@ static void migration_completion(MigrationState *s)
>          goto fail;
>      }
>  
> -    /*
> -     * If rp was opened we must clean up the thread before
> -     * cleaning everything else up (since if there are no failures
> -     * it will wait for the destination to send it's status in
> -     * a SHUT command).
> -     */
> -    if (s->rp_state.rp_thread_created) {
> -        int rp_error;
> -        trace_migration_return_path_end_before();
> -        rp_error = await_return_path_close_on_source(s);
> -        trace_migration_return_path_end_after(rp_error);
> -        if (rp_error) {
> -            goto fail;
> -        }
> +    if (close_return_path_on_source(s)) {
> +        goto fail;
>      }
>  
>      if (qemu_file_get_error(s->to_dst_file)) {
> -- 
> 2.35.3
> 

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]