qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 2/5] guest agent: qemu-ga daemon


From: Michael Roth
Subject: Re: [Qemu-devel] [PATCH v5 2/5] guest agent: qemu-ga daemon
Date: Mon, 20 Jun 2011 18:40:38 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10

On 06/20/2011 09:16 AM, Luiz Capitulino wrote:
On Sun, 19 Jun 2011 14:00:30 -0500
Michael Roth<address@hidden>  wrote:

On 06/17/2011 10:25 PM, Luiz Capitulino wrote:
On Fri, 17 Jun 2011 16:25:32 -0500
Michael Roth<address@hidden>   wrote:

On 06/17/2011 03:13 PM, Luiz Capitulino wrote:
On Fri, 17 Jun 2011 14:21:31 -0500
Michael Roth<address@hidden>    wrote:

On 06/16/2011 01:42 PM, Luiz Capitulino wrote:
On Tue, 14 Jun 2011 15:06:22 -0500
Michael Roth<address@hidden>     wrote:

This is the actual guest daemon, it listens for requests over a
virtio-serial/isa-serial/unix socket channel and routes them through
to dispatch routines, and writes the results back to the channel in
a manner similar to QMP.

A shorthand invocation:

      qemu-ga -d

Is equivalent to:

      qemu-ga -c virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent \
              -p /var/run/qemu-guest-agent.pid -d

Signed-off-by: Michael Roth<address@hidden>

Would be nice to have a more complete description, like explaining how to
do a simple test.

And this can't be built...

---
     qemu-ga.c              |  631 
++++++++++++++++++++++++++++++++++++++++++++++++
     qga/guest-agent-core.h |    4 +
     2 files changed, 635 insertions(+), 0 deletions(-)
     create mode 100644 qemu-ga.c

diff --git a/qemu-ga.c b/qemu-ga.c
new file mode 100644
index 0000000..df08d8c
--- /dev/null
+++ b/qemu-ga.c
@@ -0,0 +1,631 @@
+/*
+ * QEMU Guest Agent
+ *
+ * Copyright IBM Corp. 2011
+ *
+ * Authors:
+ *  Adam Litke<address@hidden>
+ *  Michael Roth<address@hidden>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include<stdlib.h>
+#include<stdio.h>
+#include<stdbool.h>
+#include<glib.h>
+#include<gio/gio.h>
+#include<getopt.h>
+#include<termios.h>
+#include<syslog.h>
+#include "qemu_socket.h"
+#include "json-streamer.h"
+#include "json-parser.h"
+#include "qint.h"
+#include "qjson.h"
+#include "qga/guest-agent-core.h"
+#include "qga-qmp-commands.h"
+#include "module.h"
+
+#define QGA_VIRTIO_PATH_DEFAULT "/dev/virtio-ports/org.qemu.guest_agent"
+#define QGA_PIDFILE_DEFAULT "/var/run/qemu-va.pid"
+#define QGA_BAUDRATE_DEFAULT B38400 /* for isa-serial channels */
+#define QGA_TIMEOUT_DEFAULT 30*1000 /* ms */
+
+struct GAState {
+    const char *proxy_path;

Where is this used?


Nowhere actually. Will remove.

+    JSONMessageParser parser;
+    GMainLoop *main_loop;
+    guint conn_id;
+    GSocket *conn_sock;
+    GIOChannel *conn_channel;
+    guint listen_id;
+    GSocket *listen_sock;
+    GIOChannel *listen_channel;
+    const char *path;
+    const char *method;
+    bool virtio; /* fastpath to check for virtio to deal with poll() quirks */
+    GACommandState *command_state;
+    GLogLevelFlags log_level;
+    FILE *log_file;
+    bool logging_enabled;
+};
+
+static void usage(const char *cmd)
+{
+    printf(
+"Usage: %s -c<channel_opts>\n"
+"QEMU Guest Agent %s\n"
+"\n"
+"  -c, --channel     channel method: one of unix-connect, virtio-serial, or\n"
+"                    isa-serial (virtio-serial is the default)\n"
+"  -p, --path        channel path (%s is the default for virtio-serial)\n"
+"  -l, --logfile     set logfile path, logs to stderr by default\n"
+"  -f, --pidfile     specify pidfile (default is %s)\n"
+"  -v, --verbose     log extra debugging information\n"
+"  -V, --version     print version information and exit\n"
+"  -d, --daemonize   become a daemon\n"
+"  -h, --help        display this help and exit\n"
+"\n"
+"Report bugs to<address@hidden>\n"
+    , cmd, QGA_VERSION, QGA_VIRTIO_PATH_DEFAULT, QGA_PIDFILE_DEFAULT);
+}
+
+static void conn_channel_close(GAState *s);
+
+static const char *ga_log_level_str(GLogLevelFlags level)
+{
+    switch (level&     G_LOG_LEVEL_MASK) {
+        case G_LOG_LEVEL_ERROR:
+            return "error";
+        case G_LOG_LEVEL_CRITICAL:
+            return "critical";
+        case G_LOG_LEVEL_WARNING:
+            return "warning";
+        case G_LOG_LEVEL_MESSAGE:
+            return "message";
+        case G_LOG_LEVEL_INFO:
+            return "info";
+        case G_LOG_LEVEL_DEBUG:
+            return "debug";
+        default:
+            return "user";
+    }
+}
+
+bool ga_logging_enabled(GAState *s)
+{
+    return s->logging_enabled;
+}
+
+void ga_disable_logging(GAState *s)
+{
+    s->logging_enabled = false;
+}
+
+void ga_enable_logging(GAState *s)
+{
+    s->logging_enabled = true;
+}

Just to check I got this right, this is needed because of the fsfreeze
command, correct? Isn't it better to have a more descriptive name, like
fsfrozen?

First I thought this was about a log file. Then I realized this was
probably about letting the user log in, but we don't seem to have this
functionality so I got confused.


Yup, this is currently due to fsfreeze support. From the perspective of
the fsfreeze command the explicit "is_frozen" check makes more sense,
but the reason it affects other RPCs is because because we can't log
stuff in that state. If an RPC shoots itself in the foot by writing to a
frozen filesystem we don't really care so much, and up until recently
that case was handled with a pthread_cancel timeout mechanism (was
removed for the time being, will re-implement using a child process most
likely).

What we don't want to do is give a host a way to bypass the expectation
we set for guest owners by allowing RPCs to be logged. So that's what
the check is based on, rather than lower level details like *why*
logging is disabled. Also, I'd really like to avoid a precedence where a
single RPC can place restrictions on all the others, so the logging
aspect seemed general enough that it doesn't necessarily provide that
precedence.

This has come up a few times without any real consensus. I can probably
go either way, but I think the check for logging is easier to set
expectations with: "if logging is important from an auditing
perspective, don't execute this if logging is disabled". beyond that,
same behavior/symantics you'd get with anything that causes a write() on
a filesystem in a frozen state: block.

While I understand your arguments, I still find the current behavior
confusing: you issue a guest-open-file command and get a QgaLoggingFailed
error. The first question is: what does a log write fail have to do with me
opening a file? The second question is: what should I do? Try again? Give up?


I agree it's a better solution for the client there, but at the same time:

guest_privileged_ping():
     if fsfreeze.status == FROZEN:
       syslog("privileged ping, thought you should know")
     else:
       return "error, filesystem frozen"

Seems silly to the client as well. "Why does a ping command care about
the filesystem state?"

Inability to log is the true error. That's also the case for the
guest-file-open command.

There's a difference. In the guest ping the inability to log is an
internal agent error, it's not interesting to the client. We could just
ignore the error and reply back (unless we define that guest-privileged-ping
requires writing to disk).


To clarify, these's 2 different types of errors here and their relation
to fsfreeze is causing the separation to get munged a bit:

1) Logging/accounting errors: even though we try to make it clear
wherever possible this solution is based on a "trusted" hypervisor that
already has substantial privileges over guests (direct access to images,
etc), one of our internal use cases is the ability for customers to
properly account for actions initiated by the guest agent. Shutdowns,
whats files were read/written, etc.

This is fine.

For some commands, this accounting is not important. guest-ping, or
guest-info, for instance, is uninteresting from a security accounting
perspective. So logging is in fact optional and logging failures are
silently ignored.

guest-shutdown, guest-file-open, guest-fsfreeze, however, are important.
So the QgaLoggingError we currently report really means "this command
requires logging, but logging has been disabled". You're right that this
is because of fsfreeze, but it's not because of the filesystem state.
guest-file-open on a network mount that wasn't frozen would *still*, by
design, report an error due to inability to log.

Completely ignoring the file-system state and considering only what you
say about security, this doesn't make sense to me. Actually, I think it's
a flaw, because a client could open the daemon's log file and write garbage
on it.

But, how does the ability to log protects you from anything? I would understand
it's importance if the user had to provide credentials to use the agent,
but s/he doesn't. It's like you were unable to use a linux box because syslogd
is not running.

This looks like a misfeature to me. If you really want to provide it, then
you could provide a --enable-log-error which, when enabled, would make the
daemon not to execute *any* command if the it's able to log its actions.
Otherwise log errors are just ignored.


Hmm, you're right. I'd thought that if you clobber the log file at least the RPC that clobbered the log file would get logged, but since we only log the guest-file-open, and we do that before the guest-file-write, they could overwrite and fabricate the log file history and nothing would be obviously suspicious.

Since we can't build any accounting around something like that I'm fine with having the logging silently fail in a freeze state for now.

A better thought out accounting mechanism can be added as an optional feature later if it comes to that.

For this case I do see your point about the error not being very helpful
to users, which is why I think something like:

"Guest agent failed to log RPC due to frozen filesystems"

Is the best way to report it. If we need to fix up the error id to
something like QgaLoggingToFrozenFilesystem I'd be fine with that.

2) EWOULDBLOCK or I/O errors due to writing to frozen filesystems: this
is currently treated as a PEBKAC and not enforced/reported at all. I
wouldn't be against disabling guest-file-write as a side-effect of
freezing, but that's not totally correct. Writes to non-frozen
filesystems like networked or sysfs mounts is technically still okay,
for instance. We store path information in guest-file-open and use it to
scan against fsfreeze's state info about frozen mounts to handle this a
little better.

Yes, I agree with you here.


Treating I/O errors/deadlocks due to fsfreeze as a user issue (as things are currently), or working around it? If the latter, a), b), or c)?


Even then you still have RPCs like guest-shutdown that may do a syslog()
that would cause the command to freeze, or in the future we may add the
ability for the guest agent to execute user/distro-installed scripts
that may/may not need to write to the filesystem. Some these might even
be intended to run when the filesystem is frozen...cleanup scripts for
databases and whatnot for snapshotting is something I've seen come up.

We can

a) be very restrictive, which I'm not totally against...my initial
inclination was to disable everything except
guest-fsfreeze-thaw/guest-fsfreeze-status while frozen, but this has
some limitations that aren't as obscure/unlikely as one would hope.

b) we can be completely unrestrictive about it and give users the same
experiences they'd get at a command-line if they, or another user on the
system, was messing with fsfreeze.

c) try to capture some common cases, but make it clear that you can
still shoot yourself in the foot using the guest agent on a frozen
filesystem, evaluating when to enforce this in code on a case-by-case basis.

Personally I like b), mostly because it's the least work, but also
because it's actually the easiest way to set expectations about fsfreeze.

The guest-file-write command, on the other hand, clearly requires to write
to disk, so a client would expect a EWOULDBLOCK error.

EWOULDBLOCK looks good to me. We could define as general rule that
commands don't block, so clients should always expect a EWOULDBLOCK.


This falls under 2)

There's nothing wrong with opening a read-only
file on a frozen filesystem, it's the fact that we can't log the open
that causes us to bail out..

But why do we bail out? Why can't we just ignore the fact we can't log?


This falls under 1)

So, what if we just munge the 2 to give the user the proper clues to fix
things, and instead return an error like:

"Guest agent failed to log RPC (is the filesystem frozen?)"?

The 'desc' part of an error is a human error descriptor, clients shouldn't
parse it. The real error is the error class.


As mentioned above, we could change the error id itself to capture both
the immediate error and the underlying error.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]