qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migrati


From: Juan Quintela
Subject: Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
Date: Fri, 18 Feb 2022 18:36:03 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> Hello Peter, thanks for reviewing!
>
> On Mon, Feb 7, 2022 at 11:22 PM Peter Xu <peterx@redhat.com> wrote:
>>
>> On Tue, Feb 01, 2022 at 03:29:03AM -0300, Leonardo Bras wrote:
>> > -void multifd_send_sync_main(QEMUFile *f)
>> > +int multifd_send_sync_main(QEMUFile *f)
>> >  {
>> >      int i;
>> > +    bool flush_zero_copy;
>> >
>> >      if (!migrate_use_multifd()) {
>> > -        return;
>> > +        return 0;
>> >      }
>> >      if (multifd_send_state->pages->num) {
>> >          if (multifd_send_pages(f) < 0) {
>> >              error_report("%s: multifd_send_pages fail", __func__);
>> > -            return;
>> > +            return 0;
>>
>> I've not checked how it used to do if multifd_send_pages() failed, but.. 
>> should
>> it returns -1 rather than 0 when there will be a return code?
>
> Yeah, that makes sense.
> The point here is that I was trying not to modify much of the current 
> behavior.

    if (qatomic_read(&multifd_send_state->exiting)) {
        return -1;
    }

        if (p->quit) {
            error_report("%s: channel %d has already quit!", __func__, i);
            qemu_mutex_unlock(&p->mutex);
            return -1;
        }

This are the only two cases where the current code can return one
error.  In the 1st case we are exiting, we are already in the middle of
finishing, so we don't really care.
In the second one, we have already quit, and error as already quite big.

But I agree with both comments:
- we need to improve the error paths
- leonardo changes don't affect what is already there.


> I mean, multifd_send_sync_main() would previously return void, so any
> other errors would not matter to the caller of this function, which
> will continue to run as if nothing happened.
>
> Now, if it fails with flush_zero_copy, the operation needs to be aborted.
>
> Maybe, I should make it different:
> - In any error, return -1.
> - Create/use a specific error code in the case of a failing
> flush_zero_copy, so I can test the return value for it on the caller
> function and return early.

We need to add the check.  It don't matter if the problem is zero_copy
or the existing one, we are under a minor catastrophe and migration has
to be aborted.

> Or alternatively, the other errors could also return early, but since
> this will change how the code currently works, I would probably need
> another patch for that change. (so it can be easily reverted if
> needed)
>
> What do you think is better?
>
>
>> >          }
>> >      }
>> > +
>> > +    /*
>> > +     * When using zero-copy, it's necessary to flush after each iteration 
>> > to
>> > +     * make sure pages from earlier iterations don't end up replacing 
>> > newer
>> > +     * pages.
>> > +     */
>> > +    flush_zero_copy = migrate_use_zero_copy_send();
>> > +
>> >      for (i = 0; i < migrate_multifd_channels(); i++) {
>> >          MultiFDSendParams *p = &multifd_send_state->params[i];
>> >
>> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
>> >          if (p->quit) {
>> >              error_report("%s: channel %d has already quit", __func__, i);
>> >              qemu_mutex_unlock(&p->mutex);
>> > -            return;
>> > +            return 0;
>>
>> Same question here.
>
> Please see above,
>
>>
>> >          }
>>
>> The rest looks good.  Thanks,

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]