qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS


From: Akihiko Odaki
Subject: Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
Date: Tue, 26 Apr 2022 12:57:37 +0900
User-agent: Mozilla/5.0 (X11; Linux aarch64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0

On 2022/04/25 3:45, Christian Schoenebeck wrote:
+    }
+    err = chmod(addr.sun_path, mode);

I'm not sure if it is fine to have a time window between bind() and
chmod(). Do you have some rationale?

Good question. QEMU's 9p server is multi-threaded; all 9p requests come in
serialized and the 9p server controller portion (9p.c) is only running on
QEMU main thread, but the actual filesystem driver calls are then
dispatched to QEMU worker threads and therefore running concurrently at
this point:

https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines

Similar situation on Linux 9p client side: it handles access to a mounted
9p filesystem concurrently, requests are then serialized by 9p driver on
Linux and sent over wire to 9p server (host).

So yes, there might be implications by that short time windows. But could
that be exploited on macOS hosts in practice?

The socket file would have mode srwxr-xr-x for a short moment.

For security_model=mapped* this should not be a problem.

For security_model=none|passhrough, in theory, maybe? But how likely is
that? If you are using a Linux client for instance, trying to brute-force
opening the socket file, the client would send several 9p commands
(Twalk, Tgetattr, Topen, probably more). The time window of the two
commands above should be much smaller than that and I would expect one of
the 9p commands to error out in between.

What would be a viable approach to avoid this issue on macOS?

It is unlikely that a naive brute-force approach will succeed to
exploit. The more concerning scenario is that the attacker uses the
knowledge of the underlying implementation of macOS to cause resource
contention to widen the window. Whether an exploitation is viable
depends on how much time you spend digging XNU.

However, I'm also not sure if it really *has* a race condition. Looking
at v9fs_co_mknod(), it sequentially calls s->ops->mknod() and
s->ops->lstat(). It also results in an entity called "path name based
fid" in the code, which inherently cannot identify a file when it is
renamed or recreated.

If there is some rationale it is safe, it may also be applied to the
sequence of bind() and chmod(). Can anyone explain the sequence of
s->ops->mknod() and s->ops->lstat() or path name based fid in general?

You are talking about 9p server's controller level: I don't see something that
would prevent a concurrent open() during this bind() ... chmod() time window
unfortunately.

Argument 'fidp' passed to function v9fs_co_mknod() reflects the directory in
which the new device file shall be created. So 'fidp' is not the device file
here, nor is 'fidp' modified during this function.

Function v9fs_co_mknod() is entered by 9p server on QEMU main thread. At the
beginning of the function it first acquires a read lock on a (per 9p export)
global coroutine mutex:

     v9fs_path_read_lock(s);

and holds this lock until returning from function v9fs_co_mknod(). But that's
just a read lock. Function v9fs_co_open() also just gains a read lock. So they
can happen concurrently.

Then v9fs_co_run_in_worker({...}) is called to dispatch and execute all the
code block (think of it as an Obj-C "block") inside this (macro actually) on a
QEMU worker thread. So an arbitrary background thread would then call the fs
driver functions:

     s->ops->mknod()
     v9fs_name_to_path()
     s->ops->lstat()

and then at the end of the code block the background thread would dispatch
back to QEMU main thread. So when we are reaching:

     v9fs_path_unlock(s);

we are already back on QEMU main thread, hence unlocking on main thread now
and finally leaving function v9fs_co_mknod().

The important thing to understand is, while that

     v9fs_co_run_in_worker({...})

code block is executed on a QEMU worker thread, the QEMU main thread (9p
server controller portion, i.e. 9p.c) is *not* sleeping, QEMU main thread
rather continues to process other (if any) client requests in the meantime. In
other words v9fs_co_run_in_worker() neither behaves exactly like Apple's GCD
dispatch_async(), nor like dispatch_sync(), as GCD is not coroutine based.

So 9p server might pull a pending 'Topen' client request from the input FIFO
in the meantime and likewise dispatch that to a worker thread, etc. Hence a
concurrent open() might in theory be possible, but I find it quite unlikely to
succeed in practice as the open() call on guest is translated by Linux client
into a bunch of synchronous 9p requests on the path passed with the open()
call on guest, and a round trip for each 9p message is like what, ~0.3ms or
something in this order. That's quite huge compared to the time window I would
expect between bind() ... open().

Does this answer your questions?

The time window may be widened by a malicious actor if the actor knows XNU well so the window length inferred from experiences is not really enough to claim it safe, particularly when considering about security.

On the other hand, I'm wondering if there is same kind of a time window between s->ops->mknodat() and s->ops->lstat(). Also, there should be similar time windows among operations with "path name based fid" as they also use path names as identifiers. If there is a rationale that it is considered secure, we may be able to apply the same logic to the time window between bind() and chmod() and claim it secure. I need a review from someone who understands that part of the code, therefore.

Regards,
Akihiko Odaki



reply via email to

[Prev in Thread] Current Thread [Next in Thread]