qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] 9pfs: local: ignore O_NOATIME if we don't have permissions


From: Omar Sandoval
Subject: Re: [PATCH] 9pfs: local: ignore O_NOATIME if we don't have permissions
Date: Thu, 16 Apr 2020 11:47:11 -0700

On Thu, Apr 16, 2020 at 04:58:31PM +0200, Christian Schoenebeck wrote:
> On Donnerstag, 16. April 2020 02:44:33 CEST Omar Sandoval wrote:
> > From: Omar Sandoval <address@hidden>
> > 
> > QEMU's local 9pfs server passes through O_NOATIME from the client. If
> > the QEMU process doesn't have permissions to use O_NOATIME (namely, it
> > does not own the file nor have the CAP_FOWNER capability), the open will
> > fail. This causes issues when from the client's point of view, it
> > believes it has permissions to use O_NOATIME (e.g., a process running as
> > root in the virtual machine). Additionally, overlayfs on Linux opens
> > files on the lower layer using O_NOATIME, so in this case a 9pfs mount
> > can't be used as a lower layer for overlayfs (cf.
> > https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad965
> > 0/vmtest/onoatimehack.c and https://github.com/NixOS/nixpkgs/issues/54509).
> > 
> > Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g.,
> > network filesystems. open(2) notes that O_NOATIME "may not be effective
> > on all filesystems. One example is NFS, where the server maintains the
> > access time." This means that we can honor it when possible but fall
> > back to ignoring it.
> 
> I am not sure whether NFS would simply silently ignore O_NOATIME i.e. without 
> returning EPERM. I don't read it that way.

As far as I can tell, the NFS protocol has nothing equivalent to
O_NOATIME and thus can't honor it. Feel free to test it:

  # mount -t nfs -o vers=4,rw 10.0.2.2:/ /mnt
  # echo foo > /mnt/foo
  # touch -d "1 hour ago" /mnt/foo
  # stat /mnt/foo | grep 'Access: [0-9]'
  Access: 2020-04-16 10:43:36.838952593 -0700
  # # Drop caches so we have to go to the NFS server.
  # echo 3 > /proc/sys/vm/drop_caches
  # strace -e openat dd if=/mnt/foo of=/dev/null iflag=noatime |& grep /mnt/foo
  openat(AT_FDCWD, "/mnt/foo", O_RDONLY|O_NOATIME) = 3
  # stat /mnt/foo | grep 'Access: [0-9]'
  Access: 2020-04-16 11:43:36.906462928 -0700

> Fact is on Linux the expected 
> behaviour is returning EPERM if O_NOATIME cannot be satisfied, consistent 
> since its introduction 22 years ago:
> http://lkml.iu.edu/hypermail/linux/kernel/9811.2/0118.html

The exact phrasing in the man-page is: "EPERM  The O_NOATIME flag was
specified, but the effective user ID of the caller did not match the
owner of the file and the caller was not privileged." IMO, it's about
whether the (guest) process has permission from the (guest) kernel's
point of view, not whether the filesystem could satisfy it.

> > Signed-off-by: Omar Sandoval <address@hidden>
> > ---
> >  hw/9pfs/9p-util.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
> > index 79ed6b233e..50842d540f 100644
> > --- a/hw/9pfs/9p-util.h
> > +++ b/hw/9pfs/9p-util.h
> > @@ -37,9 +37,14 @@ static inline int openat_file(int dirfd, const char
> > *name, int flags, {
> >      int fd, serrno, ret;
> > 
> > +again:
> >      fd = openat(dirfd, name, flags | O_NOFOLLOW | O_NOCTTY | O_NONBLOCK,
> >                  mode);
> >      if (fd == -1) {
> > +        if (errno == EPERM && (flags & O_NOATIME)) {
> > +            flags &= ~O_NOATIME;
> > +            goto again;
> > +        }
> >          return -1;
> >      }
> 
> It would certainly fix the problem in your use case. But it would also unmask 
> O_NOATIME for all other ones (i.e. regular users on guest).

The guest kernel will still check whether processes on the guest have
permission to use O_NOATIME. This only changes the behavior when the
guest kernel believes that the process has permission even though the
host QEMU process doesn't.

> I mean I understand your point, but I also have to take other use cases into 
> account which might expect to receive EPERM if O_NOATIME cannot be granted.

If you'd still like to preserve this behavior, would it be acceptable to
make this a QEMU option? Maybe something like "-virtfs
honor_noatime=no": the default would be "yes", which is the current
behavior, and "no" would always mask out NOATIME.

> May I ask how come that file/dir in question does not share the same uid in 
> your specific use case? Are the file(s) created outside of QEMU, i.e. 
> directly 
> by some app on host?

My use case is running tests on different versions of the Linux kernel
while reusing the host's userspace environment. I export the host's root
filesystem read-only to the guest via 9pfs, and the guest sets up
overlayfs on top of it (to allow certain modifications) and chroots into
that. Without a workaround like the LD_PRELOAD one I mentioned in the
commit message, any (read) accesses to files owned by root on the host
(like /bin/sh) will fail.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]