[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
From: |
Peter Xu |
Subject: |
Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd |
Date: |
Tue, 31 Jan 2023 16:01:58 -0500 |
On Tue, Jan 31, 2023 at 08:06:55PM +0000, Daniel P. Berrangé wrote:
> On Tue, Jan 31, 2023 at 02:48:54PM -0500, Peter Xu wrote:
> > On Thu, Jan 26, 2023 at 12:26:45PM -0500, Peter Xu wrote:
> > > On Thu, Jan 26, 2023 at 03:59:33PM +0000, Daniel P. Berrangé wrote:
> > > > On Thu, Jan 26, 2023 at 10:25:05AM -0500, Peter Xu wrote:
> > > > > On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert
> > > > > wrote:
> > > > > > * Michal Prívozník (mprivozn@redhat.com) wrote:
> > > > > > > On 1/25/23 23:40, Peter Xu wrote:
> > > > > > > > The new /dev/userfaultfd handle is superior to the system call
> > > > > > > > with a
> > > > > > > > better permission control and also works for a restricted
> > > > > > > > seccomp
> > > > > > > > environment.
> > > > > > > >
> > > > > > > > The new device was only introduced in v6.1 so we need a header
> > > > > > > > update.
> > > > > > > >
> > > > > > > > Please have a look, thanks.
> > > > > > >
> > > > > > > I was wondering whether it would make sense/be possible for mgmt
> > > > > > > app
> > > > > > > (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening
> > > > > > > it
> > > > > > > itself. But looking into the code, libvirt would need to do that
> > > > > > > when
> > > > > > > spawning QEMU because that's when QEMU itself initializes
> > > > > > > internal state
> > > > > > > and queries userfaultfd caps.
> > > > > >
> > > > > > You also have to be careful about what the userfaultfd semantics
> > > > > > are; I
> > > > > > can't remember them - but if you open it in one process and pass it
> > > > > > to
> > > > > > another process, which processes address space are you trying to
> > > > > > monitor?
> > > > >
> > > > > Yes it's a problem. The kernel always fetches the current mm_struct*
> > > > > which
> > > > > represents the current context of virtual address space when creating
> > > > > the
> > > > > uffd handle (for either the syscall or the ioctl() approach).
> > > >
> > > > At what point does the process address space get associated ? When
> > > > the /dev/userfaultfd is opened, or only when ioctl(USERFAULTFD_IOC_NEW)
> > > > is called ? If it is the former, then we have no choice, QEMU must open
> > > > it. if it is the latter, then libvirt can open /dev/userfaultfd, pass
> > > > it to QEMU which can then do the ioctl(USERFAULTFD_IOC_NEW).
> > >
> > > Good point.. It should be the latter, so should be doable.
> > >
> > > What should be the best interface for QEMU to detect the fd passing over
> > > to
> > > it? IIUC qemu_open() requires the name to be /dev/fdset/*, but there's no
> > > existing cmdline that QEMU can know which fd number to fetch from fdset to
> > > be used as the /dev/userfaultfd descriptor.
> > >
> > > monitor_get_fd() seems more proper, where we can define an unique string
> > > so
> > > Libvirt can preset the descriptor with the same string attached to it,
> > > then
> > > I can opt-in monitor_get_fd() before trying to open() or doing the
> > > syscall.
> >
> > Daniel/Michal, any input here from Libvirt side?
> >
> > I just noticed that monitor_get_fd() is bound to a specific monitor, then
> > it seems not clear which one is from Libvirt. If to use qemu_open() and
> > add-fd I think we need another QEMU cmdline to set the fd path, iiuc.
> >
> > I can also leave that for later if opening /dev/userfaultfd is already
> > resolving the immediate problem in containers.
>
> I don't have any great ideas really. If we assume the /dev/userfaultfd
> is accessible to QEMU we can ignore it.
It's my understanding that QEMU process will be invoked by the user or
group that has access to /dev/userfaultfd, probably in the same context as
what Libvirt specified. So hopefully everything will work out naturally
already.
There's one thing I'm unsure on introducing a new qemu cmdline option - I
can't remember where I get this memory, but - IIRC Paolo suggested at some
point to reduce or forbid introducing new options to QEMU.
To remedy that, we can also add a migration parameter which will point to
/dev/userfaultfd (which can be set to "/dev/fdsets/N" by Libvirt in QMP in
QEMU's early stage), considering that so far most of the uffd features are
used by migration submodule, IMHO it's fine to do so.
Said that, I think we can always work on top of this series if that'll be
useful to libvirt some day; the change should be trivial. So I can keep
this series simple.
I'll wait 1-2 more days to see whether Michal has anything to comment.
Thanks,
--
Peter Xu
- Re: [PATCH 3/3] util/userfaultfd: Support /dev/userfaultfd, (continued)
- Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd, Michal Prívozník, 2023/01/26
- Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd, Dr. David Alan Gilbert, 2023/01/26
- Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd, Daniel P . Berrangé, 2023/01/26
- Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd, Peter Xu, 2023/01/26
- Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd, Peter Xu, 2023/01/31
- Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd, Daniel P . Berrangé, 2023/01/31
- Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd,
Peter Xu <=