qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_f


From: Leonardo Bras Soares Passos
Subject: Re: [PATCH v5 3/6] QIOChannelSocket: Implement io_writev_zerocopy & io_flush_zerocopy for CONFIG_LINUX
Date: Thu, 9 Dec 2021 05:49:01 -0300

On Thu, Dec 9, 2021 at 5:38 AM Leonardo Bras Soares Passos
<leobras@redhat.com> wrote:
>
> Hello Daniel,
>
> On Fri, Dec 3, 2021 at 6:18 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Fri, Dec 03, 2021 at 02:42:19AM -0300, Leonardo Bras Soares Passos wrote:
> > > Hello Daniel,
> > >
> > > On Tue, Nov 23, 2021 at 6:56 AM Daniel P. Berrangé <berrange@redhat.com> 
> > > wrote:
> > > >
> > > > On Tue, Nov 23, 2021 at 01:46:44AM -0300, Leonardo Bras Soares Passos 
> > > > wrote:
> > > > > Hello Daniel,
> > > > >
> > > > > On Fri, Nov 12, 2021 at 7:54 AM Daniel P. Berrangé 
> > > > > <berrange@redhat.com> wrote:
> > > > > > > +
> > > > > > > +#ifdef CONFIG_LINUX
> > > > > > > +
> > > > > > > +static int qio_channel_socket_poll(QIOChannelSocket *sioc, bool 
> > > > > > > zerocopy,
> > > > > > > +                                   Error **errp)
> > > > > >
> > > > > > There's only one caller and it always passes zerocopy=true,
> > > > > > so this parmeter looks pointless.
> > > > >
> > > > > I did that for possible reuse of this function in the future:
> > > > > - As of today, this is certainly compiled out, but if at some point
> > > > > someone wants to use poll for something other
> > > > > than the reading of an zerocopy errqueue, it could be reused.
> > > > >
> > > > > But sure, if that's not desirable, I can remove the parameter (and the
> > > > > if clause for !zerocopy).
> > > > >
> > > > > >
> > > > > > > +{
> > > > > > > +    struct pollfd pfd;
> > > > > > > +    int ret;
> > > > > > > +
> > > > > > > +    pfd.fd = sioc->fd;
> > > > > > > +    pfd.events = 0;
> > > > > > > +
> > > > > > > + retry:
> > > > > > > +    ret = poll(&pfd, 1, -1);
> > > > > > > +    if (ret < 0) {
> > > > > > > +        switch (errno) {
> > > > > > > +        case EAGAIN:
> > > > > > > +        case EINTR:
> > > > > > > +            goto retry;
> > > > > > > +        default:
> > > > > > > +            error_setg_errno(errp, errno,
> > > > > > > +                             "Poll error");
> > > > > > > +            return ret;
> > > > > >
> > > > > >        return -1;
> > > > > >
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    if (pfd.revents & (POLLHUP | POLLNVAL)) {
> > > > > > > +        error_setg(errp, "Poll error: Invalid or disconnected 
> > > > > > > fd");
> > > > > > > +        return -1;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    if (!zerocopy && (pfd.revents & POLLERR)) {
> > > > > > > +        error_setg(errp, "Poll error: Errors present in 
> > > > > > > errqueue");
> > > > > > > +        return -1;
> > > > > > > +    }
> > > > > >
> > > > > > > +
> > > > > > > +    return ret;
> > > > > >
> > > > > >   return 0;
> > > > >
> > > > > In the idea of future reuse I spoke above, returning zero here would
> > > > > make this function always look like the poll timed out. Some future
> > > > > users may want to repeat the waiting if poll() timed out, or if
> > > > > (return > 0) stop polling.
> > > >
> > > > Now that I'm looking again, we should not really use poll() at all,
> > > > as GLib provides us higher level APIs. We in fact already have the
> > > > qio_channel_wait() method as a general purpose helper for waiting
> > > > for an I/O condition to occur.;
> > > >
> > >
> > > So you suggest using
> > > qio_channel_wait(sioc, G_IO_IN);
> > > instead of creating the new qio_channel_socket_poll().
> > >
> > > Is the above correct? I mean, is it as simple as that?
> >
> > Yes, hopefully it is that simple.
>
> It seems not to be the case.
> After some testing, I found out using this stalls the migration.
>
> This happens when multifd_send_sync_main() calls flush_zerocopy(), but
> the migration threads are
> in multifd_send_thread() calling qemu_sem_wait(&p->sem);
>
> I don't really understand enough of GLib to know how much this is
> different from a poll(), but seems to make a difference.

Oh, nevermind.
A few minutes reading GLib docs was enough for me to understand my mistake.
We will need to use G_IO_ERR instead of G_IO_IN, because we are
waiting for messages
in the ERRQUEUE.

>
> >
> > > > > I understand the idea of testing SO_EE_CODE_ZEROCOPY_COPIED to be able
> > > > > to tell whenever zerocopy fell back to copying for some reason, but I
> > > > > don't see how this can be helpful here.
> > > > >
> > > > > Other than that I would do rv++ instead of rv=1 here, if I want to
> > > > > keep track of how many buffers were sent with zerocopy and how many
> > > > > ended up being copied.
> > > >
> > > > Sure, we could do   "ret > 0 == number of buffers that were copied"
> > > > as the API contract, rather than just treating it as a boolean.
> > >
> > > Ok, then you suggest the responsibility of checking the number of
> > > writes with SO_EE_CODE_ZEROCOPY_COPIED, comparing with the total
> > > number of writes,  and deciding whether to disable or not zerocopy
> > > should be on the caller.
> >
> > Yep, its a usage policy so nicer to allow caller to decide the
> > policy.
> >
> > Regards,
> > Daniel
> > --
> > |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange 
> > :|
> > |: https://libvirt.org         -o-            https://fstop138.berrange.com 
> > :|
> > |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange 
> > :|
> >




reply via email to

[Prev in Thread] Current Thread [Next in Thread]