[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1] os-posix: Add -unshare option
From: |
Daniel P. Berrange |
Subject: |
Re: [Qemu-devel] [PATCH v1] os-posix: Add -unshare option |
Date: |
Thu, 19 Oct 2017 17:24:20 +0100 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
On Thu, Oct 19, 2017 at 05:04:19PM +0100, Ross Lagerwall wrote:
> Add an option to allow calling unshare() just before starting guest
> execution. The option allows unsharing one or more of the mount
> namespace, the network namespace, and the IPC namespace. This is useful
> to restrict the ability of QEMU to cause damage to the system should it
> be compromised.
>
> An example of using this would be to have QEMU open a QMP socket at
> startup and unshare the network namespace. The instance of QEMU could
> still be controlled by the QMP socket since that belongs in the original
> namespace, but if QEMU were compromised it wouldn't be able to open any
> new connections, even to other processes on the same machine.
Unless I'm misunderstanding you, what's described here is already possible
by just using the 'unshare' command to spawn QEMU:
# unshare --ipc --mount --net qemu-system-x86_64 -qmp unix:/tmp/foo,server
-vnc :1
qemu-system-x86_64: -qmp unix:/tmp/foo,server: QEMU waiting for connection
on: disconnected:unix:/tmp/foo,server
And in another shell I can still access the QMP socket from the original host
namespace
# ./scripts/qmp/qmp-shell /tmp/foo
Welcome to the QMP low-level shell!
Connected to QEMU 2.9.1
(QEMU) query-kvm
{"return": {"enabled": false, "present": true}}
FWIW, even if that were not possible, you could still do it by wrapping the
qmp-shell in an 'nsenter' call. eg
nsenter --target $QEMUPID --net ./scripts/qmp/qmp-shell /tmp/foo
> Signed-off-by: Ross Lagerwall <address@hidden>
> ---
> os-posix.c | 34 ++++++++++++++++++++++++++++++++++
> qemu-options.hx | 14 ++++++++++++++
> 2 files changed, 48 insertions(+)
>
> diff --git a/os-posix.c b/os-posix.c
> index b9c2343..cfc5c38 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -45,6 +45,7 @@ static struct passwd *user_pwd;
> static const char *chroot_dir;
> static int daemonize;
> static int daemon_pipe;
> +static int unshare_flags;
>
> void os_setup_early_signal_handling(void)
> {
> @@ -160,6 +161,28 @@ void os_parse_cmd_args(int index, const char *optarg)
> fips_set_state(true);
> break;
> #endif
> +#ifdef CONFIG_SETNS
> + case QEMU_OPTION_unshare:
> + {
> + char *flag;
> + char *opts = g_strdup(optarg);
> +
> + while ((flag = qemu_strsep(&opts, ",")) != NULL) {
> + if (!strcmp(flag, "mount")) {
> + unshare_flags |= CLONE_NEWNS;
> + } else if (!strcmp(flag, "net")) {
> + unshare_flags |= CLONE_NEWNET;
> + } else if (!strcmp(flag, "ipc")) {
> + unshare_flags |= CLONE_NEWIPC;
> + } else {
> + fprintf(stderr, "Unknown unshare option: %s\n", flag);
> + exit(1);
> + }
> + }
> + g_free(opts);
> + }
> + break;
> +#endif
> }
> }
>
> @@ -201,6 +224,16 @@ static void change_root(void)
>
> }
>
> +static void unshare_namespaces(void)
> +{
> + if (unshare_flags) {
> + if (unshare(unshare_flags) < 0) {
> + perror("could not unshare");
> + exit(1);
> + }
> + }
> +}
> +
> void os_daemonize(void)
> {
> if (daemonize) {
> @@ -266,6 +299,7 @@ void os_setup_post(void)
> }
>
> change_root();
> + unshare_namespaces();
> change_process_uid();
This has some really bad implications. All the command line options that are
given are processed *beforfe* os_setup_post() is called. IOW, -chardev, -vnc,
-migrate, -net, etc will all be configured in the context of the host namespace.
If you then use the QMP monitor to run chardev_add, device_add, migrate,
hostnet_add, etc this will all take place in the new namespace.
So the exact same args give as ARGV now have completely different semantics
when given via QMP.
I think this is really very undesirable.
If you wrap QEMU execution in 'unshare' as I illustrate above, then the
semantics of ARGV & QMP remain consistent.
FWIW, as a further point that might be of interest, libvirt will now spawn
a new private mount namespace for QEMU by default. We do this so that we can
give QEMU a private /dev filesystem with only the devices its permitted to
use present as device nodes. The ability to do such setup tasks inbetween
namespace creation and QEMU launching is broadly useful. For example, if
using a private network namespace, you might want to create a veth pair and
put one end in the namespace, so that QEMU's network services have some
level of outside network connectivity - eg to enable QEMU to connect to a remote
QEMU for live migration.
So overall, I absolutely encourage the use of namespaces to confine QEMU,
but I tend to think namespace creation/setup is better done outside QEMU
before launching it.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|