[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Virtio-fs] [PATCH v2 2/2] viriofsd: Add support for FUSE_HANDLE_KIL
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Virtio-fs] [PATCH v2 2/2] viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2 |
Date: |
Mon, 8 Feb 2021 17:35:39 +0000 |
User-agent: |
Mutt/1.14.6 (2020-07-11) |
* Vivek Goyal (vgoyal@redhat.com) wrote:
> This patch adds basic support for FUSE_HANDLE_KILLPRIV_V2. virtiofsd
> can enable/disable this by specifying option "-o killpriv_v2/no_killpriv_v2".
> By default this is enabled as long as client supports it
>
> Enabling this option helps with performance in write path. Without this
> option, currently every write is first preceeded with a getxattr() operation
> to find out if security.capability is set. (Write is supposed to clear
> security.capability). With this option enabled, server is signing up for
> clearing security.capability on every WRITE and also clearing suid/sgid
> subject to certain rules. This gets rid of extra getxattr() call for every
> WRITE and improves performance. This is true when virtiofsd is run with
> option -o xattr.
>
> What does enabling FUSE_HANDLE_KILLPRIV_V2 mean for file server
> implementation.
> It needs to adhere to following rules. Thanks to Miklos for this summary.
>
> - clear "security.capability" on write, truncate and chown unconditionally
> - clear suid/sgid in case of following. Note, sgid is cleared only if
> group executable bit is set.
> o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set.
> o setattr has FATTR_UID or FATTR_GID
> o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID
> o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set.
> o write has FUSE_WRITE_KILL_SUIDGID
>
> >From Linux VFS client perspective, here are the requirements.
>
> - caps are always cleared on chown/write/truncate
> - suid is always cleared on chown, while for truncate/write it is cleared
> only if caller does not have CAP_FSETID.
> - sgid is always cleared on chown, while for truncate/write it is cleared
> only if caller does not have CAP_FSETID as well as file has group execute
> permission.
>
> virtiofsd implementation has not changed much to adhere to above ruls. And
> reason being that current assumption is that we are running on Linux
> and on top of filesystems like ext4/xfs which already follow above rules.
> On write, truncate, chown, seucurity.capability is cleared. And virtiofsd
> drops CAP_FSETID if need be and that will lead to clearing of suid/sgid.
>
> But if virtiofsd is running on top a filesystem which breaks above
> assumptions,
> then it will have to take extra actions to emulate above. That's a TODO
> for later when need arises.
>
> Note: create normally is supposed to be called only when file does not
> exist. So generally there should not be any question of clearing
> setuid/setgid. But it is possible that after client checks that
> file is not present, some other client creates file on server
> and this race can trigger sending FUSE_CREATE. In that case, if
> O_TRUNC is set, we should clear suid/sgid if FUSE_OPEN_KILL_SUIDGID
> is also set.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
> include/standard-headers/linux/fuse.h | 28 ++++++-
> tools/virtiofsd/fuse_common.h | 15 ++++
> tools/virtiofsd/fuse_lowlevel.c | 11 ++-
> tools/virtiofsd/fuse_lowlevel.h | 1 +
> tools/virtiofsd/passthrough_ll.c | 108 +++++++++++++++++++++++---
> 5 files changed, 148 insertions(+), 15 deletions(-)
>
> diff --git a/include/standard-headers/linux/fuse.h
> b/include/standard-headers/linux/fuse.h
> index 82c0a38b59..a233f3ac02 100644
> --- a/include/standard-headers/linux/fuse.h
> +++ b/include/standard-headers/linux/fuse.h
Qemu already has these changes because the whole kernel header set was
resync'd at rc2 a few weeks back.
<snip>
other than that, I think it's OK, so:
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index 5aee5193eb..6f4a1ff3a9 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -359,6 +359,21 @@ struct fuse_file_info {
> */
> #define FUSE_CAP_SUBMOUNTS (1 << 27)
>
> +/**
> + * Indicates that the filesystem is responsible for clearing
> + * security.capability xattr and clearing setuid and setgid bits. Following
> + * are the rules.
> + * - clear "security.capability" on write, truncate and chown unconditionally
> + * - clear suid/sgid if following is true. Note, sgid is cleared only if
> + * group executable bit is set.
> + * o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set.
> + * o setattr has FATTR_UID or FATTR_GID
> + * o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID
> + * o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set.
> + * o write has FUSE_WRITE_KILL_SUIDGID
> + */
> +#define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
> +
> /**
> * Ioctl flags
> *
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index c70fb16a9a..65f01a3fe3 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -865,7 +865,7 @@ static void do_setattr(fuse_req_t req, fuse_ino_t nodeid,
> FUSE_SET_ATTR_GID | FUSE_SET_ATTR_SIZE |
> FUSE_SET_ATTR_ATIME | FUSE_SET_ATTR_MTIME |
> FUSE_SET_ATTR_ATIME_NOW | FUSE_SET_ATTR_MTIME_NOW |
> - FUSE_SET_ATTR_CTIME;
> + FUSE_SET_ATTR_CTIME | FUSE_SET_ATTR_KILL_SUIDGID;
>
> req->se->op.setattr(req, nodeid, &stbuf, arg->valid, fi);
> } else {
> @@ -1079,6 +1079,7 @@ static void do_create(fuse_req_t req, fuse_ino_t nodeid,
>
> memset(&fi, 0, sizeof(fi));
> fi.flags = arg->flags;
> + fi.kill_priv = arg->open_flags & FUSE_OPEN_KILL_SUIDGID;
>
> req->ctx.umask = arg->umask;
>
> @@ -1102,6 +1103,7 @@ static void do_open(fuse_req_t req, fuse_ino_t nodeid,
>
> memset(&fi, 0, sizeof(fi));
> fi.flags = arg->flags;
> + fi.kill_priv = arg->open_flags & FUSE_OPEN_KILL_SUIDGID;
>
> if (req->se->op.open) {
> req->se->op.open(req, nodeid, &fi);
> @@ -1993,6 +1995,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> if (arg->flags & FUSE_SUBMOUNTS) {
> se->conn.capable |= FUSE_CAP_SUBMOUNTS;
> }
> + if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
> + se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> + }
> #ifdef HAVE_SPLICE
> #ifdef HAVE_VMSPLICE
> se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> @@ -2124,6 +2129,10 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> outarg.congestion_threshold = se->conn.congestion_threshold;
> outarg.time_gran = se->conn.time_gran;
>
> + if (se->conn.want & FUSE_CAP_HANDLE_KILLPRIV_V2) {
> + outarg.flags |= FUSE_HANDLE_KILLPRIV_V2;
> + }
> +
> fuse_log(FUSE_LOG_DEBUG, " INIT: %u.%u\n", outarg.major, outarg.minor);
> fuse_log(FUSE_LOG_DEBUG, " flags=0x%08x\n", outarg.flags);
> fuse_log(FUSE_LOG_DEBUG, " max_readahead=0x%08x\n",
> outarg.max_readahead);
> diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
> index 9c06240f9e..96d10defc8 100644
> --- a/tools/virtiofsd/fuse_lowlevel.h
> +++ b/tools/virtiofsd/fuse_lowlevel.h
> @@ -146,6 +146,7 @@ struct fuse_forget_data {
> #define FUSE_SET_ATTR_ATIME_NOW (1 << 7)
> #define FUSE_SET_ATTR_MTIME_NOW (1 << 8)
> #define FUSE_SET_ATTR_CTIME (1 << 10)
> +#define FUSE_SET_ATTR_KILL_SUIDGID (1 << 11)
>
> /*
> * Request methods and replies
> diff --git a/tools/virtiofsd/passthrough_ll.c
> b/tools/virtiofsd/passthrough_ll.c
> index 6407b1a2e1..36bdca449a 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -180,6 +180,7 @@ struct lo_data {
>
> /* An O_PATH file descriptor to /proc/self/fd/ */
> int proc_self_fd;
> + int user_killpriv_v2, killpriv_v2;
> };
>
> static const struct fuse_opt lo_opts[] = {
> @@ -210,6 +211,8 @@ static const struct fuse_opt lo_opts[] = {
> { "allow_direct_io", offsetof(struct lo_data, allow_direct_io), 1 },
> { "no_allow_direct_io", offsetof(struct lo_data, allow_direct_io), 0 },
> { "announce_submounts", offsetof(struct lo_data, announce_submounts), 1
> },
> + { "killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 1 },
> + { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
> FUSE_OPT_END
> };
> static bool use_syslog = false;
> @@ -618,6 +621,34 @@ static void lo_init(void *userdata, struct
> fuse_conn_info *conn)
> lo->announce_submounts = false;
> }
> #endif
> +
> + if (lo->user_killpriv_v2 == 1) {
> + /*
> + * User explicitly asked for this option. Enable it unconditionally.
> + * If connection does not have this capability, it should fail
> + * in fuse_lowlevel.c
> + */
So that would fail with an old guest?
> + fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling killpriv_v2\n");
> + conn->want |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> + lo->killpriv_v2 = 1;
> + } else if (lo->user_killpriv_v2 == -1 &&
> + conn->capable & FUSE_CAP_HANDLE_KILLPRIV_V2) {
> + /*
> + * User did not specify a value for killpriv_v2. By default enable it
> + * if connection offers this capability
> + */
> + fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling killpriv_v2\n");
> + conn->want |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> + lo->killpriv_v2 = 1;
> + } else {
> + /*
> + * Either user specified to disable killpriv_v2, or connection does
> + * not offer this capability. Disable killpriv_v2 in both the cases
> + */
> + fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling killpriv_v2\n");
> + conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> + lo->killpriv_v2 = 0;
> + }
> }
>
> static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> @@ -702,7 +733,10 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino,
> struct stat *attr,
> }
> if (valid & FUSE_SET_ATTR_SIZE) {
> int truncfd;
> + bool kill_suidgid;
> + bool cap_fsetid_dropped = false;
>
> + kill_suidgid = lo->killpriv_v2 && (valid &
> FUSE_SET_ATTR_KILL_SUIDGID);
> if (fi) {
> truncfd = fd;
> } else {
> @@ -714,8 +748,25 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino,
> struct stat *attr,
> }
> }
>
> + if (kill_suidgid) {
> + res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> + if (res != 0) {
> + saverr = res;
> + if (!fi) {
> + close(truncfd);
> + }
> + goto out_err;
> + }
> + }
> +
> res = ftruncate(truncfd, attr->st_size);
> saverr = res == -1 ? errno : 0;
> +
> + if (cap_fsetid_dropped) {
> + if (gain_effective_cap("FSETID")) {
> + fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
> + }
> + }
> if (!fi) {
> close(truncfd);
> }
> @@ -1680,9 +1731,11 @@ static void lo_create(fuse_req_t req, fuse_ino_t
> parent, const char *name,
> struct fuse_entry_param e;
> int err;
> struct lo_cred old = {};
> + bool cap_fsetid_dropped = false;
> + bool kill_suidgid = lo->killpriv_v2 && fi->kill_priv;
>
> - fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)\n",
> parent,
> - name);
> + fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)"
> + " kill_priv=%d\n", parent, name, fi->kill_priv);
>
> if (!is_safe_path_component(name)) {
> fuse_reply_err(req, EINVAL);
> @@ -1702,9 +1755,23 @@ static void lo_create(fuse_req_t req, fuse_ino_t
> parent, const char *name,
>
> update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>
> + if (kill_suidgid) {
> + err = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> + if (err != 0) {
> + lo_restore_cred(&old);
> + goto out;
> + }
> + }
> +
> fd = openat(parent_inode->fd, name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
> mode);
> err = fd == -1 ? errno : 0;
> +
> + if (cap_fsetid_dropped) {
> + if (gain_effective_cap("FSETID")) {
> + fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
> + }
> + }
> lo_restore_cred(&old);
>
> if (!err) {
> @@ -1902,20 +1969,38 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t
> ino, int datasync,
>
> static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info
> *fi)
> {
> - int fd;
> + int fd, ret;
> ssize_t fh;
> char buf[64];
> struct lo_data *lo = lo_data(req);
> + bool cap_fsetid_dropped = false;
> + bool kill_suidgid = lo->killpriv_v2 && fi->kill_priv;
>
> - fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
> - fi->flags);
> + fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d,
> kill_priv=%d)"
> + "\n", ino, fi->flags, fi->kill_priv);
>
> update_open_flags(lo->writeback, lo->allow_direct_io, fi);
>
> sprintf(buf, "%i", lo_fd(req, ino));
> +
> + if (kill_suidgid) {
> + ret = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> + if (ret != 0) {
> + fuse_reply_err(req, ret);
> + return;
> + }
> + }
> +
> fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
> - if (fd == -1) {
> - return (void)fuse_reply_err(req, errno);
> + ret = fd == -1 ? errno : 0;
> + if (cap_fsetid_dropped) {
> + if (gain_effective_cap("FSETID")) {
> + fuse_log(FUSE_LOG_ERR, "Failed to gain CAP_FSETID\n");
> + }
> + }
> +
> + if (ret) {
> + return (void)fuse_reply_err(req, ret);
> }
>
> pthread_mutex_lock(&lo->mutex);
> @@ -2049,12 +2134,14 @@ static void lo_write_buf(fuse_req_t req, fuse_ino_t
> ino,
> out_buf.buf[0].pos = off;
>
> fuse_log(FUSE_LOG_DEBUG,
> - "lo_write_buf(ino=%" PRIu64 ", size=%zd, off=%lu)\n", ino,
> - out_buf.buf[0].size, (unsigned long)off);
> + "lo_write_buf(ino=%" PRIu64 ", size=%zd, off=%lu
> kill_priv=%d)\n",
> + ino, out_buf.buf[0].size, (unsigned long)off, fi->kill_priv);
>
> /*
> * If kill_priv is set, drop CAP_FSETID which should lead to kernel
> - * clearing setuid/setgid on file.
> + * clearing setuid/setgid on file. Note, for WRITE, we need to do
> + * this even if killpriv_v2 is not enabled. fuse direct write path
> + * relies on this.
> */
> if (fi->kill_priv) {
> res = drop_effective_cap("FSETID", &cap_fsetid_dropped);
> @@ -3433,6 +3520,7 @@ int main(int argc, char *argv[])
> .posix_lock = 0,
> .allow_direct_io = 0,
> .proc_self_fd = -1,
> + .user_killpriv_v2 = -1,
> };
> struct lo_map_elem *root_elem;
> int ret = -1;
> --
> 2.25.4
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Virtio-fs] [PATCH v2 2/2] viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2,
Dr. David Alan Gilbert <=