qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [RFC 22/29] vhost+postcopy: Call wakeups
Date: Fri, 14 Jul 2017 17:18:49 +0300

On Wed, Jul 12, 2017 at 05:00:04PM +0200, Andrea Arcangeli wrote:
> What we could do is to add a UFFDIO_BIND that takes an "fd" as
> parameter to the ioctl to bind the two uffd together. Then we could
> push logical offsets in addition to the virtual address ranges when
> calling UFFDIO_REGISTER_LOGICAL (the logical offsets would then match
> the guest physical addresses) so that the UFFDIO_COPY_LOGICAL would
> then be able to get a logical range to wakeup that the kernel would
> translate into virtual addresses for all uffds bind together. Pushing
> offsets into UFFDIO_REGISTER was David's idea.

I think it was mine originally just in an off-list discussion.
To me it seems cleaner to do UFFDIO_DUP which gives you
a new fd bound to the current one, though.

> That would eliminate the enter/exit kernel for the explicit
> UFFDIO_WAKE and calling a single UFFDIO_COPY would be enough.
> 
> Alternatively we should make the uffd work based on file offsets
> instead of virtual addresses but that would involve changes to
> filesystems and it only would move the needle on top of tmpfs
> (shared=on/off no difference) and hugetlbfs. It would be enough for
> vhost-bridge.
> 
> Usually the uffd fault lives at the higher level of the virtual memory
> subsystem and never deals with file offsets so if we can get away with
> logical ranges per-uffd for UFFDIO_REGISTER and UFFDIO_COPY, it may be
> simpler and easier to extend automatically to all memory types
> supported by uffd (including anon which has no file offset).
> 
> No major improvement is to be expected by such an enhancement though
> so it's not very high priority to implement. It's not even clear if
> the complexity is worth it. Doing one more syscall per page I think
> might be measurable only on very fast network. The current way of
> operation where uffd are independent of each other and the translation
> table is transferred by userland means is quite optimal already and
> much simpler. Furthermore for hugetlbfs the performance difference
> most certainly wouldn't be measurable, as the enter/exit kernel would
> be diluted by a factor of 512 compared to 4k userfaults.
> 
> Thanks,
> Andrea



reply via email to

[Prev in Thread] Current Thread [Next in Thread]