qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source


From: Fabiano Rosas
Subject: Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
Date: Thu, 03 Aug 2023 11:45:38 -0300

Peter Xu <peterx@redhat.com> writes:

> On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
>> >> This function currently has a straight-forward part which is waiting
>> >> for the thread to join and a complicated part which is doing a
>> >> qemu_file_shutdown() on the return path file.
>> >> 
>> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
>> >> f->last_error to -EIO, which means we can never know if an error is an
>> >> actual error or if we cleanly shutdown the file previously.
>> >> 
>> >> This is particularly bothersome for postcopy because it would send the
>> >> return path thread into the retry routine which would wait on the
>> >> postcopy_pause_rp_sem and consequently block the main thread. We
>> >> haven't had reports of this so I must presume we never reach here with
>> >> postcopy.
>> >> 
>> >> The shutdown call is also racy because since it doesn't take the
>> >> qemu_file_lock, it could NULL-dereference if the return path thread
>> >> happens to be in the middle of the critical region at
>> >> migration_release_dst_files().
>> >
>> > After you rework the thread model on resume, shall we move
>> > migration_release_dst_files() into the migration thread to be after the
>> > pthread_join()?  I assume then we don't even need a mutex to protect it?
>> >
>> 
>> I just need to figure out if it's ok to move the postcopy_qemufile_src
>> cleanup along. No idea why it is there in the first place. I see you
>> moved it from postcopy_pause and we're about to move it back to the
>> exact same place =D
>
> It was there because the old postcopy-preempt was sending data via
> postcopy_qemufile_src from the migration thread, while postcopy_pause is
> also the migration thread context.
>
> Then we had 9358982744 ("migration: Send requested page directly in
> rp-return thread") where we moved that "send page" operation into the
> return path thread to reduce latencies.  After moving there it also means
> the file handle can be accessed in >1 threads, so I just moved it over to
> operate that always in the return path thread, then no race should happen.
>

Thanks for the context.

> With your change, return path will vanish before migration thread accesses
> it later (so as mentioned above, it must be after pthread_join()
> succeeded), then I assume it'll be fine too to have it back in migration
> thread.
>
> Or perhaps just take the file lock?
>

There's also migrate_fd_cleanup and migrate_fd_cancel that can touch
these files. We might need to lock anyway, let's see.

In general I'd like to drop all of these "ok not to lock, because...",
it's too easy for code to change and the assumptions to stop being
true. IMHO it's not worth it to gain performance by not taking a lock
when the data is still shared and there's nothing stopping someone in
the future from accessing it concurrently.

>> 
>> >> 
>> >> Move this more complicated part of the code to a separate routine so
>> >> we can wait on the thread without all of this baggage.
>> >
>> > I think you mentioned "some nuance" on having mark_source_rp_bad() in
>> > await_return_path_close_on_source(), I did remember I tried to look into
>> > that "nuance" too a long time ago but I just forgot what was that.  Great
>> > if you can share some details.
>> >
>> 
>> Well, mark_source_rp_bad() at await_return_path_close_on_source() is
>> basically useless:
>> 
>> - We only call mark_source_rp_bad() if s->to_dst_file has an error and the
>>   migration_completion() already checks that condition and fails the
>>   migration anyway.
>> 
>> - If to_dst_file has an error, chances are the destination already did
>>   cleanup by this point, so from_dst_file would already have an errno,
>>   due to that. At qemu_fill_buffer(), the len == 0 case basically means
>>   "the other end finished cleanly". We still set -EIO in that case, I
>>   don't know why. Possibly because not all backends will have the same
>>   semantics for len == 0.
>
> I don't know either, but I checked it's from a555b8092a ("qemu-file: Don't
> do IO after shutdown").  Maybe there's better way to do so we could
> identify the difference, but yes can be for later.
>

That is not the -EIO I'm talking about, but I'm glad you mentioned it,
because I just noticed we might need to revert ac7d25b816 ("qemu-file:
remove shutdown member").

It did a purely logical change which is to drop the f->shutdown flag,
but that has the side-effect that now we cannot differentiate an orderly
shutdown from an IO error.

I get that perhaps ideally all of the code would be resilient to be able
to handle a shutdown in the same way as an IO error, but I'm not sure we
are prepared for that.


Now, the actual -EIO I was mentioning is this one in qemu-file.c:

static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
{
    ...
    ...
    do {
        len = qio_channel_read(f->ioc,
                               (char *)f->buf + pending,
                               IO_BUF_SIZE - pending,
                               &local_error);
        if (len == QIO_CHANNEL_ERR_BLOCK) {
            if (qemu_in_coroutine()) {
                qio_channel_yield(f->ioc, G_IO_IN);
            } else {
                qio_channel_wait(f->ioc, G_IO_IN);
            }
        } else if (len < 0) {
            len = -EIO;
        }
    } while (len == QIO_CHANNEL_ERR_BLOCK);

    if (len > 0) {
        f->buf_size += len;
        f->total_transferred += len;
    } else if (len == 0) {
        qemu_file_set_error_obj(f, -EIO, local_error); <---- THIS
    } else {
        qemu_file_set_error_obj(f, len, local_error);
    }

    return len;
}

the recvmsg manual says:

  "When a stream socket peer has performed an orderly shutdown, the return
  value will be 0 (the traditional "end-of-file" return)."

So both issues combined put us in a situation where there's never an
"orderly shutdown", everything is an error. I wouldn't say that's wrong,
but it's not clear to me whether we consciously made that design
decision.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]