[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migrati
From: |
Leonardo Bras Soares Passos |
Subject: |
Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) |
Date: |
Mon, 21 Feb 2022 16:41:55 -0300 |
Hello Juan, thanks for the feedback!
On Fri, Feb 18, 2022 at 1:57 PM Juan Quintela <quintela@redhat.com> wrote:
>
> Leonardo Bras <leobras@redhat.com> wrote:
> > Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
> > writev + flags & flush interface.
> >
> > Change multifd_send_sync_main() so flush_zero_copy() can be called
> > after each iteration in order to make sure all dirty pages are sent before
> > a new iteration is started. It will also flush at the beginning and at the
> > end of migration.
> >
> > Also make it return -1 if flush_zero_copy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually
> > freed,
> > and there is no problem on changing the pages content between
> > writev_zero_copy() and
> > the actual sending of the buffer, because this change will dirty the page
> > and
> > cause it to be re-sent on a next iteration anyway.
> >
> > A lot of locked memory may be needed in order to use multid migration
> ^^^^^^
> multifd.
>
> I can fix it on the commit.
No worries, fixed for v9.
>
>
> > @@ -1479,7 +1479,16 @@ static bool migrate_params_check(MigrationParameters
> > *params, Error **errp)
> > error_prepend(errp, "Invalid mapping given for
> > block-bitmap-mapping: ");
> > return false;
> > }
> > -
> > +#ifdef CONFIG_LINUX
> > + if (params->zero_copy_send &&
> > + (!migrate_use_multifd() ||
> > + params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > + (params->tls_creds && *params->tls_creds))) {
> > + error_setg(errp,
> > + "Zero copy only available for non-compressed non-TLS
> > multifd migration");
> > + return false;
> > + }
> > +#endif
> > return true;
> > }
>
> Test is long, but it is exactly what we need. Good.
Thanks!
>
>
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 43998ad117..2d68b9cf4f 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -568,19 +568,28 @@ void multifd_save_cleanup(void)
> > multifd_send_state = NULL;
> > }
> >
> > -void multifd_send_sync_main(QEMUFile *f)
> > +int multifd_send_sync_main(QEMUFile *f)
> > {
> > int i;
> > + bool flush_zero_copy;
> >
> > if (!migrate_use_multifd()) {
> > - return;
> > + return 0;
> > }
> > if (multifd_send_state->pages->num) {
> > if (multifd_send_pages(f) < 0) {
> > error_report("%s: multifd_send_pages fail", __func__);
> > - return;
> > + return 0;
> > }
> > }
> > +
> > + /*
> > + * When using zero-copy, it's necessary to flush after each iteration
> > to
> > + * make sure pages from earlier iterations don't end up replacing newer
> > + * pages.
> > + */
> > + flush_zero_copy = migrate_use_zero_copy_send();
> > +
> > for (i = 0; i < migrate_multifd_channels(); i++) {
> > MultiFDSendParams *p = &multifd_send_state->params[i];
> >
> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> > if (p->quit) {
> > error_report("%s: channel %d has already quit", __func__, i);
> > qemu_mutex_unlock(&p->mutex);
> > - return;
> > + return 0;
> > }
> >
> > p->packet_num = multifd_send_state->packet_num++;
> > @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f)
> > ram_counters.transferred += p->packet_len;
> > qemu_mutex_unlock(&p->mutex);
> > qemu_sem_post(&p->sem);
> > +
> > + if (flush_zero_copy) {
> > + int ret;
> > + Error *err = NULL;
> > +
> > + ret = qio_channel_flush(p->c, &err);
> > + if (ret < 0) {
> > + error_report_err(err);
> > + return -1;
> > + }
> > + }
> > }
> > for (i = 0; i < migrate_multifd_channels(); i++) {
> > MultiFDSendParams *p = &multifd_send_state->params[i];
> > @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f)
> > qemu_sem_wait(&p->sem_sync);
> > }
> > trace_multifd_send_sync_main(multifd_send_state->packet_num);
> > +
> > + return 0;
> > }
>
> We are leaving pages is flight for potentially a lot of time. I *think*
> that we can sync shorter than that.
>
> > static void *multifd_send_thread(void *opaque)
> > @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque)
> > p->iov[0].iov_len = p->packet_len;
> > p->iov[0].iov_base = p->packet;
> >
> > - ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> > - &local_err);
> > + ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num,
> > NULL,
> > + 0, p->write_flags,
> > &local_err);
> > if (ret != 0) {
> > break;
> > }
>
> I still think that it would be better to have a sync here in each
> thread. I don't know the size, but once every couple megabytes of RAM
> or so.
This seems a good idea, since the first iteration may take a while,
and we may take a lot of time to fail if something goes wrong with
zerocopy at the start of iteration 1.
On the other hand, flushing takes some time, and doing it a lot may
take away some of the performance improvements.
Maybe we could move the flushing to a thread of it's own, if it
becomes a problem.
>
> I did a change on:
>
> commit d48c3a044537689866fe44e65d24c7d39a68868a
> Author: Juan Quintela <quintela@redhat.com>
> Date: Fri Nov 19 15:35:58 2021 +0100
>
> multifd: Use a single writev on the send side
>
> Until now, we wrote the packet header with write(), and the rest of the
> pages with writev(). Just increase the size of the iovec and do a
> single writev().
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
> And now we need to "perserve" this header until we do the sync,
> otherwise we are overwritting it with other things.
Yeah, that is a problem I faced on non-multifd migration, and a reason
why I choose to implement directly in multifd.
>
> What testing have you done after this commit?
Not much, but this will probably become an issue with bigger guests.
>
> Notice that it is not _complicated_ to fix it, I will try to come with
> some idea on monday, but basically is having an array of buffers for
> each thread, and store them until we call a sync().
That will probably work, we can use MultiFDRecvParams->packet as this
array, right?
We can start with a somehow small buffer size and grow if it ever gets full.
(Full means a sync + g_try_malloc0 + g_free)
By my calculations, it should take 1,8kB each element on a 4k PAGESIZE.
What do you think?
Best regards,
Leo
[PATCH v8 3/5] migration: Add zero-copy-send parameter for QMP/HMP for Linux, Leonardo Bras, 2022/02/01