qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Virtio-fs] [PATCH v2 2/2] viriofsd: Add support for FUSE_HANDLE_KIL


From: Dr. David Alan Gilbert
Subject: Re: [Virtio-fs] [PATCH v2 2/2] viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2
Date: Mon, 8 Feb 2021 17:35:39 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

* Vivek Goyal (vgoyal@redhat.com) wrote:
> This patch adds basic support for FUSE_HANDLE_KILLPRIV_V2. virtiofsd
> can enable/disable this by specifying option "-o killpriv_v2/no_killpriv_v2".
> By default this is enabled as long as client supports it
> 
> Enabling this option helps with performance in write path. Without this
> option, currently every write is first preceeded with a getxattr() operation
> to find out if security.capability is set. (Write is supposed to clear
> security.capability). With this option enabled, server is signing up for
> clearing security.capability on every WRITE and also clearing suid/sgid
> subject to certain rules. This gets rid of extra getxattr() call for every
> WRITE and improves performance. This is true when virtiofsd is run with
> option -o xattr.
> 
> What does enabling FUSE_HANDLE_KILLPRIV_V2 mean for file server 
> implementation.
> It needs to adhere to following rules. Thanks to Miklos for this summary.
> 
> - clear "security.capability" on write, truncate and chown unconditionally
> - clear suid/sgid in case of following. Note, sgid is cleared only if
>   group executable bit is set.
>     o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set.
>     o setattr has FATTR_UID or FATTR_GID
>     o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID
>     o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set.
>     o write has FUSE_WRITE_KILL_SUIDGID
> 
> >From Linux VFS client perspective, here are the requirements.
> 
> - caps are always cleared on chown/write/truncate
> - suid is always cleared on chown, while for truncate/write it is cleared
>   only if caller does not have CAP_FSETID.
> - sgid is always cleared on chown, while for truncate/write it is cleared
>   only if caller does not have CAP_FSETID as well as file has group execute
>   permission.
> 
> virtiofsd implementation has not changed much to adhere to above ruls. And
> reason being that current assumption is that we are running on Linux
> and on top of filesystems like ext4/xfs which already follow above rules.
> On write, truncate, chown, seucurity.capability is cleared. And virtiofsd
> drops CAP_FSETID if need be and that will lead to clearing of suid/sgid.
> 
> But if virtiofsd is running on top a filesystem which breaks above 
> assumptions,
> then it will have to take extra actions to emulate above. That's a TODO
> for later when need arises.
> 
> Note: create normally is supposed to be called only when file does not
>       exist. So generally there should not be any question of clearing
>       setuid/setgid. But it is possible that after client checks that
>       file is not present, some other client creates file on server
>       and this race can trigger sending FUSE_CREATE. In that case, if
>       O_TRUNC is set, we should clear suid/sgid if FUSE_OPEN_KILL_SUIDGID
>       is also set.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  include/standard-headers/linux/fuse.h |  28 ++++++-
>  tools/virtiofsd/fuse_common.h         |  15 ++++
>  tools/virtiofsd/fuse_lowlevel.c       |  11 ++-
>  tools/virtiofsd/fuse_lowlevel.h       |   1 +
>  tools/virtiofsd/passthrough_ll.c      | 108 +++++++++++++++++++++++---
>  5 files changed, 148 insertions(+), 15 deletions(-)
> 
> diff --git a/include/standard-headers/linux/fuse.h 
> b/include/standard-headers/linux/fuse.h
> index 82c0a38b59..a233f3ac02 100644
> --- a/include/standard-headers/linux/fuse.h
> +++ b/include/standard-headers/linux/fuse.h

Qemu already has these changes because the whole kernel header set was
resync'd at rc2 a few weeks back.

<snip>

other than that, I think it's OK, so:

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index 5aee5193eb..6f4a1ff3a9 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -359,6 +359,21 @@ struct fuse_file_info {
>   */
>  #define FUSE_CAP_SUBMOUNTS (1 << 27)
>  
> +/**
> + * Indicates that the filesystem is responsible for clearing
> + * security.capability xattr and clearing setuid and setgid bits. Following
> + * are the rules.
> + * - clear "security.capability" on write, truncate and chown unconditionally
> + * - clear suid/sgid if following is true. Note, sgid is cleared only if
> + *   group executable bit is set.
> + *    o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set.
> + *    o setattr has FATTR_UID or FATTR_GID
> + *    o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID
> + *    o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set.
> + *    o write has FUSE_WRITE_KILL_SUIDGID
> + */
> +#define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
> +
>  /**
>   * Ioctl flags
>   *
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index c70fb16a9a..65f01a3fe3 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -865,7 +865,7 @@ static void do_setattr(fuse_req_t req, fuse_ino_t nodeid,
>                        FUSE_SET_ATTR_GID | FUSE_SET_ATTR_SIZE |
>                        FUSE_SET_ATTR_ATIME | FUSE_SET_ATTR_MTIME |
>                        FUSE_SET_ATTR_ATIME_NOW | FUSE_SET_ATTR_MTIME_NOW |
> -                      FUSE_SET_ATTR_CTIME;
> +                      FUSE_SET_ATTR_CTIME | FUSE_SET_ATTR_KILL_SUIDGID;
>  
>          req->se->op.setattr(req, nodeid, &stbuf, arg->valid, fi);
>      } else {
> @@ -1079,6 +1079,7 @@ static void do_create(fuse_req_t req, fuse_ino_t nodeid,
>  
>          memset(&fi, 0, sizeof(fi));
>          fi.flags = arg->flags;
> +        fi.kill_priv = arg->open_flags & FUSE_OPEN_KILL_SUIDGID;
>  
>          req->ctx.umask = arg->umask;
>  
> @@ -1102,6 +1103,7 @@ static void do_open(fuse_req_t req, fuse_ino_t nodeid,
>  
>      memset(&fi, 0, sizeof(fi));
>      fi.flags = arg->flags;
> +    fi.kill_priv = arg->open_flags & FUSE_OPEN_KILL_SUIDGID;
>  
>      if (req->se->op.open) {
>          req->se->op.open(req, nodeid, &fi);
> @@ -1993,6 +1995,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>      if (arg->flags & FUSE_SUBMOUNTS) {
>          se->conn.capable |= FUSE_CAP_SUBMOUNTS;
>      }
> +    if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
> +        se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> +    }
>  #ifdef HAVE_SPLICE
>  #ifdef HAVE_VMSPLICE
>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> @@ -2124,6 +2129,10 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>      outarg.congestion_threshold = se->conn.congestion_threshold;
>      outarg.time_gran = se->conn.time_gran;
>  
> +    if (se->conn.want & FUSE_CAP_HANDLE_KILLPRIV_V2) {
> +        outarg.flags |= FUSE_HANDLE_KILLPRIV_V2;
> +    }
> +
>      fuse_log(FUSE_LOG_DEBUG, "   INIT: %u.%u\n", outarg.major, outarg.minor);
>      fuse_log(FUSE_LOG_DEBUG, "   flags=0x%08x\n", outarg.flags);
>      fuse_log(FUSE_LOG_DEBUG, "   max_readahead=0x%08x\n", 
> outarg.max_readahead);
> diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> index 9c06240f9e..96d10defc8 100644
> --- a/tools/virtiofsd/fuse_lowlevel.h
> +++ b/tools/virtiofsd/fuse_lowlevel.h
> @@ -146,6 +146,7 @@ struct fuse_forget_data {
>  #define FUSE_SET_ATTR_ATIME_NOW (1 << 7)
>  #define FUSE_SET_ATTR_MTIME_NOW (1 << 8)
>  #define FUSE_SET_ATTR_CTIME (1 << 10)
> +#define FUSE_SET_ATTR_KILL_SUIDGID (1 << 11)
>  
>  /*
>   * Request methods and replies
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index 6407b1a2e1..36bdca449a 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -180,6 +180,7 @@ struct lo_data {
>  
>      /* An O_PATH file descriptor to /proc/self/fd/ */
>      int proc_self_fd;
> +    int user_killpriv_v2, killpriv_v2;
>  };
>  
>  static const struct fuse_opt lo_opts[] = {
> @@ -210,6 +211,8 @@ static const struct fuse_opt lo_opts[] = {
>      { "allow_direct_io", offsetof(struct lo_data, allow_direct_io), 1 },
>      { "no_allow_direct_io", offsetof(struct lo_data, allow_direct_io), 0 },
>      { "announce_submounts", offsetof(struct lo_data, announce_submounts), 1 
> },
> +    { "killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 1 },
> +    { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
>      FUSE_OPT_END
>  };
>  static bool use_syslog = false;
> @@ -618,6 +621,34 @@ static void lo_init(void *userdata, struct 
> fuse_conn_info *conn)
>          lo->announce_submounts = false;
>      }
>  #endif
> +
> +    if (lo->user_killpriv_v2 == 1) {
> +        /*
> +         * User explicitly asked for this option. Enable it unconditionally.
> +         * If connection does not have this capability, it should fail
> +         * in fuse_lowlevel.c
> +         */

So that would fail with an old guest?

> +        fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling killpriv_v2\n");
> +        conn->want |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> +        lo->killpriv_v2 = 1;
> +    } else if (lo->user_killpriv_v2 == -1 &&
> +               conn->capable & FUSE_CAP_HANDLE_KILLPRIV_V2) {
> +        /*
> +         * User did not specify a value for killpriv_v2. By default enable it
> +         * if connection offers this capability
> +         */
> +        fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling killpriv_v2\n");
> +        conn->want |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> +        lo->killpriv_v2 = 1;
> +    } else {
> +        /*
> +         * Either user specified to disable killpriv_v2, or connection does
> +         * not offer this capability. Disable killpriv_v2 in both the cases
> +         */
> +        fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling killpriv_v2\n");
> +        conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> +        lo->killpriv_v2 = 0;
> +    }
>  }
>  
>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> @@ -702,7 +733,10 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
> struct stat *attr,
>      }
>      if (valid & FUSE_SET_ATTR_SIZE) {
>          int truncfd;
> +        bool kill_suidgid;
> +        bool cap_fsetid_dropped = false;
>  
> +        kill_suidgid = lo->killpriv_v2 && (valid & 
> FUSE_SET_ATTR_KILL_SUIDGID);
>          if (fi) {
>              truncfd = fd;
>          } else {
> @@ -714,8 +748,25 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
> struct stat *attr,
>              }
>          }
>  
> +        if (kill_suidgid) {
> +            res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> +            if (res != 0) {
> +                saverr = res;
> +                if (!fi) {
> +                    close(truncfd);
> +                }
> +                goto out_err;
> +            }
> +        }
> +
>          res = ftruncate(truncfd, attr->st_size);
>          saverr = res == -1 ? errno : 0;
> +
> +        if (cap_fsetid_dropped) {
> +            if (gain_effective_cap("FSETID")) {
> +                fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
> +            }
> +        }
>          if (!fi) {
>              close(truncfd);
>          }
> @@ -1680,9 +1731,11 @@ static void lo_create(fuse_req_t req, fuse_ino_t 
> parent, const char *name,
>      struct fuse_entry_param e;
>      int err;
>      struct lo_cred old = {};
> +    bool cap_fsetid_dropped = false;
> +    bool kill_suidgid = lo->killpriv_v2 && fi->kill_priv;
>  
> -    fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n", 
> parent,
> -             name);
> +    fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)"
> +             " kill_priv=%d\n", parent, name, fi->kill_priv);
>  
>      if (!is_safe_path_component(name)) {
>          fuse_reply_err(req, EINVAL);
> @@ -1702,9 +1755,23 @@ static void lo_create(fuse_req_t req, fuse_ino_t 
> parent, const char *name,
>  
>      update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>  
> +    if (kill_suidgid) {
> +        err = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> +        if (err != 0) {
> +            lo_restore_cred(&old);
> +            goto out;
> +        }
> +    }
> +
>      fd = openat(parent_inode->fd, name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
>                  mode);
>      err = fd == -1 ? errno : 0;
> +
> +    if (cap_fsetid_dropped) {
> +        if (gain_effective_cap("FSETID")) {
> +            fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
> +        }
> +    }
>      lo_restore_cred(&old);
>  
>      if (!err) {
> @@ -1902,20 +1969,38 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t 
> ino, int datasync,
>  
>  static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info 
> *fi)
>  {
> -    int fd;
> +    int fd, ret;
>      ssize_t fh;
>      char buf[64];
>      struct lo_data *lo = lo_data(req);
> +    bool cap_fsetid_dropped = false;
> +    bool kill_suidgid = lo->killpriv_v2 && fi->kill_priv;
>  
> -    fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
> -             fi->flags);
> +    fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d, 
> kill_priv=%d)"
> +             "\n", ino, fi->flags, fi->kill_priv);
>  
>      update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>  
>      sprintf(buf, "%i", lo_fd(req, ino));
> +
> +    if (kill_suidgid) {
> +        ret = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> +        if (ret != 0) {
> +            fuse_reply_err(req, ret);
> +            return;
> +        }
> +    }
> +
>      fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
> -    if (fd == -1) {
> -        return (void)fuse_reply_err(req, errno);
> +    ret = fd == -1 ? errno : 0;
> +    if (cap_fsetid_dropped) {
> +        if (gain_effective_cap("FSETID")) {
> +            fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
> +        }
> +    }
> +
> +    if (ret) {
> +        return (void)fuse_reply_err(req, ret);
>      }
>  
>      pthread_mutex_lock(&lo->mutex);
> @@ -2049,12 +2134,14 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t 
> ino,
>      out_buf.buf[0].pos = off;
>  
>      fuse_log(FUSE_LOG_DEBUG,
> -             "lo_write_buf(ino=%" PRIu64 ", size=%zd, off=%lu)\n", ino,
> -             out_buf.buf[0].size, (unsigned long)off);
> +             "lo_write_buf(ino=%" PRIu64 ", size=%zd, off=%lu 
> kill_priv=%d)\n",
> +             ino, out_buf.buf[0].size, (unsigned long)off, fi->kill_priv);
>  
>      /*
>       * If kill_priv is set, drop CAP_FSETID which should lead to kernel
> -     * clearing setuid/setgid on file.
> +     * clearing setuid/setgid on file. Note, for WRITE, we need to do
> +     * this even if killpriv_v2 is not enabled. fuse direct write path
> +     * relies on this.
>       */
>      if (fi->kill_priv) {
>          res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> @@ -3433,6 +3520,7 @@ int main(int argc, char *argv[])
>          .posix_lock = 0,
>          .allow_direct_io = 0,
>          .proc_self_fd = -1,
> +        .user_killpriv_v2 = -1,
>      };
>      struct lo_map_elem *root_elem;
>      int ret = -1;
> -- 
> 2.25.4
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]