qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] c71957: nbd: Minor style and typo fixes


From: Richard Henderson
Subject: [Qemu-commits] [qemu/qemu] c71957: nbd: Minor style and typo fixes
Date: Thu, 08 Aug 2024 17:57:07 -0700

  Branch: refs/heads/master
  Home:   https://github.com/qemu/qemu
  Commit: c719573d71afd38e3ac774e5a331fbaa0fc9f3da
      
https://github.com/qemu/qemu/commit/c719573d71afd38e3ac774e5a331fbaa0fc9f3da
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-08 (Thu, 08 Aug 2024)

  Changed paths:
    M nbd/server.c
    M qemu-nbd.c

  Log Message:
  -----------
  nbd: Minor style and typo fixes

Touch up a comment with the wrong type name, and an over-long line,
both noticed while working on upcoming patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-10-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


  Commit: fb1c2aaa981e0a2fa6362c9985f1296b74f055ac
      
https://github.com/qemu/qemu/commit/fb1c2aaa981e0a2fa6362c9985f1296b74f055ac
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-08 (Thu, 08 Aug 2024)

  Changed paths:
    M blockdev-nbd.c
    M include/block/nbd.h
    M nbd/server.c
    M qemu-nbd.c

  Log Message:
  -----------
  nbd/server: Plumb in new args to nbd_client_add()

Upcoming patches to fix a CVE need to track an opaque pointer passed
in by the owner of a client object, as well as request for a time
limit on how fast negotiation must complete.  Prepare for that by
changing the signature of nbd_client_new() and adding an accessor to
get at the opaque pointer, although for now the two servers
(qemu-nbd.c and blockdev-nbd.c) do not change behavior even though
they pass in a new default timeout value.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-11-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: s/LIMIT/MAX_SECS/ as suggested by Dan]
Signed-off-by: Eric Blake <eblake@redhat.com>


  Commit: c8a76dbd90c2f48df89b75bef74917f90a59b623
      
https://github.com/qemu/qemu/commit/c8a76dbd90c2f48df89b75bef74917f90a59b623
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-08 (Thu, 08 Aug 2024)

  Changed paths:
    M block/monitor/block-hmp-cmds.c
    M blockdev-nbd.c
    M include/block/nbd.h
    M qapi/block-export.json

  Log Message:
  -----------
  nbd/server: CVE-2024-7409: Cap default max-connections to 100

Allowing an unlimited number of clients to any web service is a recipe
for a rudimentary denial of service attack: the client merely needs to
open lots of sockets without closing them, until qemu no longer has
any more fds available to allocate.

For qemu-nbd, we default to allowing only 1 connection unless more are
explicitly asked for (-e or --shared); this was historically picked as
a nice default (without an explicit -t, a non-persistent qemu-nbd goes
away after a client disconnects, without needing any additional
follow-up commands), and we are not going to change that interface now
(besides, someday we want to point people towards qemu-storage-daemon
instead of qemu-nbd).

But for qemu proper, and the newer qemu-storage-daemon, the QMP
nbd-server-start command has historically had a default of unlimited
number of connections, in part because unlike qemu-nbd it is
inherently persistent until nbd-server-stop.  Allowing multiple client
sockets is particularly useful for clients that can take advantage of
MULTI_CONN (creating parallel sockets to increase throughput),
although known clients that do so (such as libnbd's nbdcopy) typically
use only 8 or 16 connections (the benefits of scaling diminish once
more sockets are competing for kernel attention).  Picking a number
large enough for typical use cases, but not unlimited, makes it
slightly harder for a malicious client to perform a denial of service
merely by opening lots of connections withot progressing through the
handshake.

This change does not eliminate CVE-2024-7409 on its own, but reduces
the chance for fd exhaustion or unlimited memory usage as an attack
surface.  On the other hand, by itself, it makes it more obvious that
with a finite limit, we have the problem of an unauthenticated client
holding 100 fds opened as a way to block out a legitimate client from
being able to connect; thus, later patches will further add timeouts
to reject clients that are not making progress.

This is an INTENTIONAL change in behavior, and will break any client
of nbd-server-start that was not passing an explicit max-connections
parameter, yet expects more than 100 simultaneous connections.  We are
not aware of any such client (as stated above, most clients aware of
MULTI_CONN get by just fine on 8 or 16 connections, and probably cope
with later connections failing by relying on the earlier connections;
libvirt has not yet been passing max-connections, but generally
creates NBD servers with the intent for a single client for the sake
of live storage migration; meanwhile, the KubeSAN project anticipates
a large cluster sharing multiple clients [up to 8 per node, and up to
100 nodes in a cluster], but it currently uses qemu-nbd with an
explicit --shared=0 rather than qemu-storage-daemon with
nbd-server-start).

We considered using a deprecation period (declare that omitting
max-parameters is deprecated, and make it mandatory in 3 releases -
then we don't need to pick an arbitrary default); that has zero risk
of breaking any apps that accidentally depended on more than 100
connections, and where such breakage might not be noticed under unit
testing but only under the larger loads of production usage.  But it
does not close the denial-of-service hole until far into the future,
and requires all apps to change to add the parameter even if 100 was
good enough.  It also has a drawback that any app (like libvirt) that
is accidentally relying on an unlimited default should seriously
consider their own CVE now, at which point they are going to change to
pass explicit max-connections sooner than waiting for 3 qemu releases.
Finally, if our changed default breaks an app, that app can always
pass in an explicit max-parameters with a larger value.

It is also intentional that the HMP interface to nbd-server-start is
not changed to expose max-connections (any client needing to fine-tune
things should be using QMP).

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-12-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[ericb: Expand commit message to summarize Dan's argument for why we
break corner-case back-compat behavior without a deprecation period]
Signed-off-by: Eric Blake <eblake@redhat.com>


  Commit: b9b72cb3ce15b693148bd09cef7e50110566d8a0
      
https://github.com/qemu/qemu/commit/b9b72cb3ce15b693148bd09cef7e50110566d8a0
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-08 (Thu, 08 Aug 2024)

  Changed paths:
    M nbd/server.c
    M nbd/trace-events

  Log Message:
  -----------
  nbd/server: CVE-2024-7409: Drop non-negotiating clients

A client that opens a socket but does not negotiate is merely hogging
qemu's resources (an open fd and a small amount of memory); and a
malicious client that can access the port where NBD is listening can
attempt a denial of service attack by intentionally opening and
abandoning lots of unfinished connections.  The previous patch put a
default bound on the number of such ongoing connections, but once that
limit is hit, no more clients can connect (including legitimate ones).
The solution is to insist that clients complete handshake within a
reasonable time limit, defaulting to 10 seconds.  A client that has
not successfully completed NBD_OPT_GO by then (including the case of
where the client didn't know TLS credentials to even reach the point
of NBD_OPT_GO) is wasting our time and does not deserve to stay
connected.  Later patches will allow fine-tuning the limit away from
the default value (including disabling it for doing integration
testing of the handshake process itself).

Note that this patch in isolation actually makes it more likely to see
qemu SEGV after nbd-server-stop, as any client socket still connected
when the server shuts down will now be closed after 10 seconds rather
than at the client's whims.  That will be addressed in the next patch.

For a demo of this patch in action:
$ qemu-nbd -f raw -r -t -e 10 file &
$ nbdsh --opt-mode -c '
H = list()
for i in range(20):
  print(i)
  H.insert(i, nbd.NBD())
  H[i].set_opt_mode(True)
  H[i].connect_uri("nbd://localhost")
'
$ kill $!

where later connections get to start progressing once earlier ones are
forcefully dropped for taking too long, rather than hanging.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-13-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: rebase to changes earlier in series, reduce scope of timer]
Signed-off-by: Eric Blake <eblake@redhat.com>


  Commit: 3e7ef738c8462c45043a1d39f702a0990406a3b3
      
https://github.com/qemu/qemu/commit/3e7ef738c8462c45043a1d39f702a0990406a3b3
  Author: Eric Blake <eblake@redhat.com>
  Date:   2024-08-08 (Thu, 08 Aug 2024)

  Changed paths:
    M blockdev-nbd.c

  Log Message:
  -----------
  nbd/server: CVE-2024-7409: Close stray clients at server-stop

A malicious client can attempt to connect to an NBD server, and then
intentionally delay progress in the handshake, including if it does
not know the TLS secrets.  Although the previous two patches reduce
this behavior by capping the default max-connections parameter and
killing slow clients, they did not eliminate the possibility of a
client waiting to close the socket until after the QMP nbd-server-stop
command is executed, at which point qemu would SEGV when trying to
dereference the NULL nbd_server global which is no longer present.
This amounts to a denial of service attack.  Worse, if another NBD
server is started before the malicious client disconnects, I cannot
rule out additional adverse effects when the old client interferes
with the connection count of the new server (although the most likely
is a crash due to an assertion failure when checking
nbd_server->connections > 0).

For environments without this patch, the CVE can be mitigated by
ensuring (such as via a firewall) that only trusted clients can
connect to an NBD server.  Note that using frameworks like libvirt
that ensure that TLS is used and that nbd-server-stop is not executed
while any trusted clients are still connected will only help if there
is also no possibility for an untrusted client to open a connection
but then stall on the NBD handshake.

Given the previous patches, it would be possible to guarantee that no
clients remain connected by having nbd-server-stop sleep for longer
than the default handshake deadline before finally freeing the global
nbd_server object, but that could make QMP non-responsive for a long
time.  So intead, this patch fixes the problem by tracking all client
sockets opened while the server is running, and forcefully closing any
such sockets remaining without a completed handshake at the time of
nbd-server-stop, then waiting until the coroutines servicing those
sockets notice the state change.  nbd-server-stop now has a second
AIO_WAIT_WHILE_UNLOCKED (the first is indirectly through the
blk_exp_close_all_type() that disconnects all clients that completed
handshakes), but forced socket shutdown is enough to progress the
coroutines and quickly tear down all clients before the server is
freed, thus finally fixing the CVE.

This patch relies heavily on the fact that nbd/server.c guarantees
that it only calls nbd_blockdev_client_closed() from the main loop
(see the assertion in nbd_client_put() and the hoops used in
nbd_client_put_nonzero() to achieve that); if we did not have that
guarantee, we would also need a mutex protecting our accesses of the
list of connections to survive re-entrancy from independent iothreads.

Although I did not actually try to test old builds, it looks like this
problem has existed since at least commit 862172f45c (v2.12.0, 2017) -
even back when that patch started using a QIONetListener to handle
listening on multiple sockets, nbd_server_free() was already unaware
that the nbd_blockdev_client_closed callback can be reached later by a
client thread that has not completed handshakes (and therefore the
client's socket never got added to the list closed in
nbd_export_close_all), despite that patch intentionally tearing down
the QIONetListener to prevent new clients.

Reported-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Fixes: CVE-2024-7409
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20240807174943.771624-14-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


  Commit: 0f397dcfecc9211d12c2c720c01eb32f0eaa7d23
      
https://github.com/qemu/qemu/commit/0f397dcfecc9211d12c2c720c01eb32f0eaa7d23
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   2024-08-09 (Fri, 09 Aug 2024)

  Changed paths:
    M block/monitor/block-hmp-cmds.c
    M blockdev-nbd.c
    M include/block/nbd.h
    M nbd/server.c
    M nbd/trace-events
    M qapi/block-export.json
    M qemu-nbd.c

  Log Message:
  -----------
  Merge tag 'pull-nbd-2024-08-08' of https://repo.or.cz/qemu/ericb into staging

NBD patches for 2024-08-08

- plug CVE-2024-7409, a DoS attack exploiting nbd-server-stop

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAma1PVEACgkQp6FrSiUn
# Q2qdHQf/dMydqNcPYnwEI238APyljpNvHNq6p9TYb0l5aVWisXHRlhFWM117hH7T
# Aq2KUgS5ppiEpw8mxa6/OaDa74VpMGyEPgn9w6o7T1xjVBVzpMxOKp5wFa8uICLj
# mFMYXtj9i0Rb+z0iZ+X+CqIV2Wy/FyV00Wr9T4HW94IV/9EK1sWvZvfyGWyxYyBZ
# XKTQV1Co3HYX8gfq7E88SgS064DnHjtRy2no4lwNFkBbVQCSbqwbK63TRPi7kEyC
# DmSLdHCdsD7Ev9kMZ6uNJS5T/9t7hjO5mWJckLt/cXOjHgL7GkoisLH8/nGjVkyc
# 3SUGjMn4TlzqMU99STRP+a48TLCVhA==
# =kDut
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 09 Aug 2024 07:49:05 AM AEST
# gpg:                using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) 
<ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]

* tag 'pull-nbd-2024-08-08' of https://repo.or.cz/qemu/ericb:
  nbd/server: CVE-2024-7409: Close stray clients at server-stop
  nbd/server: CVE-2024-7409: Drop non-negotiating clients
  nbd/server: CVE-2024-7409: Cap default max-connections to 100
  nbd/server: Plumb in new args to nbd_client_add()
  nbd: Minor style and typo fixes

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


Compare: https://github.com/qemu/qemu/compare/0173b97a219c...0f397dcfecc9

To unsubscribe from these emails, change your notification settings at 
https://github.com/qemu/qemu/settings/notifications



reply via email to

[Prev in Thread] Current Thread [Next in Thread]