qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH v2] blkdebug: make the fault injection functionality


From: Hitoshi Mitake
Subject: [Qemu-devel] [PATCH v2] blkdebug: make the fault injection functionality callable from QMP
Date: Wed, 27 Aug 2014 10:59:40 +0900

This patch makes the fault injection functionality of blkdebug
callable from QMP. Motivation of this change is for testing and
debugging distributed systems. Ordinal distributed systems must handle
hardware faults because of its reason for existence, but testing
whether the systems can hanle such faults and recover in a correct
manner is really hard.

Typically, developers of distributed systems check such recovery paths
with unit test or artificial environment which can be built in a
single box. But such tests can miss important attributes of real world
hardware faults. Examples of disk drive:
 - write(2) doesn't return -1 immediately in a case of disk error even
   a target file is opened with O_SYNC, if file system of the file is
   not mounted with barrier option
 - some disks become silent suddenly without causing errors, so
   applications must handle such a case with fine tuned timeout of
   disk I/O
 - some disks can cause performance degradation instead of stopping
   and causing errors [1]

For testing recovery paths and configuration of distributed systems,
mocking faults like the above examples in virtual devices is
effective. Because ordinal testing techniques which target errors of
library APIs and systemcalls cannot mock the above faults. In
addition, injecting faults at the level of virtual devices can test
whole stack of target systems (from device drivers to
applications). As a first step of implementing this testing technique,
this patch implements a new QMP command which updates error injection
rules of blkdebug. I think it is more useful for testing distributed
systems than existing config file based fault injection of
blkdebug. Because users can inject faults at any time.

With this feature, I could find a potential problem in the deployment
guide of OpenStack Swift [2]. In the guide, nobarrier option of xfs is
suggested without any caution. The option degrades durability of Swift
cluster because it delays detection of disk error. In addition, the
option is not suggested in a book of Swift guide [3]. So I concluded
the guide [2] can lead to a misconfiguration of Swift. I believe this
sort of problem can be found in other systems so the feature is useful
for developers and admins of distributed systems.

Example of launching QEMU with this feature:

sudo x86_64-softmmu/qemu-system-x86_64 -qmp \
tcp:localhost:4444,server,nowait -enable-kvm -hda \
blkdebug:/dev/null:/tmp/debian.qcow2

(/dev/null is needed because blkdebug requires configuration file, but
for QMP purpose empty file is enough)

Example of QMP sequence (via telnet localhost 4444):

{ "execute": "qmp_capabilities" }
{"return": {}}

{"execute": "blkdebug-set-rules", "arguments": {"device": "ide0-hd0",
"rules":[{"event": "write_aio", "type": "inject-error", "immediately":
1, "once": 0, "state": 1}]}} # <- inject error to /dev/sda

{"return": {}}

Now the guest OS on the VM finds the disk is broken.

Of course, using QMP directly is painful for users (developers and
admins of distributed systems). I'm implementing user friendly
interface in vagrant-kvm [4] for blackbox testing. In addition, a
testing framework for injecting faults at critical timing (which
requires solid understanding of target systems) is in progress.

[1] http://ucare.cs.uchicago.edu/pdf/socc13-limplock.pdf
[2] http://docs.openstack.org/developer/swift/howto_installmultinode.html
[3] http://www.amazon.com/dp/B00C93QFHI
[4] https://github.com/adrahon/vagrant-kvm

Cc: Eric Blake <address@hidden>
Cc: Kevin Wolf <address@hidden>
Cc: Stefan Hajnoczi <address@hidden>
Signed-off-by: Hitoshi Mitake <address@hidden>
---
 block/blkdebug.c      | 199 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h |   2 +
 qapi-schema.json      |  14 ++++
 qmp-commands.hx       |  18 +++++
 4 files changed, 233 insertions(+)

v2:
 - don't prepare a new mechanism for fault injection
 -- implement the feature with updating fault rules of blkdebug
 - add an example of QMP command

diff --git a/block/blkdebug.c b/block/blkdebug.c
index f51407d..2b9d616 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -687,6 +687,205 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
     return bdrv_getlength(bs->file);
 }
 
+struct qmp_rules_list_iter {
+    bool failed;
+    QemuOpts *set_state, *inject_error;
+
+    Error *err;
+};
+
+static void rules_list_iter(QObject *obj, void *opaque)
+{
+    struct qmp_rules_list_iter *iter = (struct qmp_rules_list_iter *)opaque;
+    QemuOpts *new_opts;
+    QDict *dict;
+    Error *err;
+    const char *type;
+
+    const char *event_name;
+    int state;
+
+    if (iter->failed) {
+        /* do nothing anymore */
+        return;
+    }
+
+    dict = qobject_to_qdict(obj);
+    if (!dict) {
+        error_set(&iter->err, QERR_INVALID_PARAMETER_TYPE,
+                  "member of rules", "dict");
+        goto fail;
+    }
+
+    event_name = qdict_get_str(dict, "event");
+    if (!event_name) {
+        error_set(&iter->err, QERR_MISSING_PARAMETER, "event");
+        goto fail;
+    }
+
+    state = qdict_get_try_int(dict, "state", 0);
+
+    type = qdict_get_str(dict, "type");
+    if (!strcmp(type, "set-state")) {
+        int new_state;
+
+        if (iter->set_state) {
+            error_setg(&iter->err, "duplicate entry for set-state");
+            goto fail;
+        }
+
+        new_opts = qemu_opts_create(&set_state_opts, NULL, 0, &err);
+        if (!new_opts) {
+            iter->err = err;
+            goto fail;
+        }
+
+        iter->set_state = new_opts;
+
+        new_state = qdict_get_try_int(dict, "new_state", 0);
+        if (qemu_opt_set_number(new_opts, "new_state", new_state) < 0) {
+            error_setg(&iter->err, "faild to set new_state");
+            goto fail;
+        }
+    } else if (!strcmp(type, "inject-error")) {
+        int _errno, sector;
+        bool once, immediately;
+
+        if (iter->inject_error) {
+            error_setg(&iter->err, "duplicate entry for inject-error");
+            goto fail;
+        }
+
+        new_opts = qemu_opts_create(&inject_error_opts, NULL, 0, &err);
+        if (!new_opts) {
+            iter->err = err;
+            goto fail;
+        }
+
+        iter->inject_error = new_opts;
+
+        _errno = qdict_get_try_int(dict, "errno", EIO);
+        if (qemu_opt_set_number(new_opts, "errno", _errno) < 0) {
+            error_setg(&iter->err, "faild to set errno");
+            goto fail;
+        }
+
+        sector = qdict_get_try_int(dict, "sector", -1);
+        if (qemu_opt_set_number(new_opts, "sector", sector) < 0) {
+            error_setg(&iter->err, "faild to set sector");
+            goto fail;
+        }
+
+        once = qdict_get_try_bool(dict, "once", 0);
+        if (qemu_opt_set_bool(new_opts, "once", once) < 0) {
+            error_setg(&iter->err, "faild to set once");
+            goto fail;
+        }
+
+        immediately = qdict_get_try_bool(dict, "immediately", 0);
+        if (qemu_opt_set_bool(new_opts, "immediately", immediately) < 0) {
+            error_setg(&iter->err, "faild to set immediately");
+            goto fail;
+        }
+    } else {
+        error_setg(&iter->err, "unknown type of rule: %s", type);
+        goto fail;
+    }
+
+    if (qemu_opt_set_number(new_opts, "state", state) < 0) {
+        error_setg(&iter->err, "faild to set state");
+        goto fail;
+    }
+
+    if (qemu_opt_set(new_opts, "event", event_name) < 0) {
+        error_setg(&iter->err, "faild to set event");
+        goto fail;
+    }
+
+    return;
+
+fail:
+    iter->failed = true;
+}
+
+int qmp_blkdebug_set_rules(Monitor *mon, const QDict *qdict, QObject **ret)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    QObject *rules = qdict_get(qdict, "rules");
+    const QList *rules_list = NULL;
+    Error *local_err = NULL;
+    BlockDriverState *bs;
+    BDRVBlkdebugState *s;
+    struct qmp_rules_list_iter iter;
+    struct add_rule_data d;
+
+    if (!device) {
+        error_set(&local_err, QERR_MISSING_PARAMETER, "device");
+        goto out;
+    }
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(&local_err, QERR_DEVICE_NOT_FOUND, device);
+        goto out;
+    }
+
+    bs = bs->file;
+    if (strcmp(bs->drv->format_name, "blkdebug")) {
+        error_setg(&local_err, "BlockDriver (%s) isn't blkdebug",
+                   bs->drv->format_name);
+        goto out;
+    }
+    s = bs->opaque;
+
+    if (!rules) {
+        error_set(&local_err, QERR_MISSING_PARAMETER, "rules");
+        goto out;
+    }
+
+    rules_list = qobject_to_qlist(rules);
+    if (!rules_list) {
+        error_set(&local_err, QERR_INVALID_PARAMETER_TYPE, "rules", "list");
+        goto out;
+    }
+
+    memset(&iter, 0, sizeof(iter));
+    qlist_iter(rules_list, rules_list_iter, &iter);
+    if (iter.failed) {
+        local_err = iter.err;
+        goto out;
+    }
+
+    d.s = s;
+    s->state = 1;
+    if (iter.inject_error) {
+        d.action = ACTION_INJECT_ERROR;
+        add_rule(iter.inject_error, &d);
+    }
+
+    if (iter.set_state) {
+        d.action = ACTION_SET_STATE;
+        add_rule(iter.set_state, &d);
+    }
+
+out:
+    if (iter.inject_error) {
+        qemu_opts_del(iter.inject_error);
+    }
+
+    if (iter.set_state) {
+        qemu_opts_del(iter.set_state);
+    }
+
+    if (local_err) {
+        qerror_report_err(local_err);
+        error_free(local_err);
+        return -1;
+    }
+
+    return 0;
+}
+
 static BlockDriver bdrv_blkdebug = {
     .format_name            = "blkdebug",
     .protocol_name          = "blkdebug",
diff --git a/include/block/block.h b/include/block/block.h
index f08471d..421a1b5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -588,4 +588,6 @@ void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
 void bdrv_flush_io_queue(BlockDriverState *bs);
 
+int qmp_blkdebug_set_rules(Monitor *mon, const QDict *qdict, QObject **ret);
+
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 341f417..13bab1d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3481,3 +3481,17 @@
 # Since: 2.1
 ##
 { 'command': 'rtc-reset-reinjection' }
+
+##
+# @blockdebug-set-rules
+#
+# Set rules of blkdebug for the given block device.
+#
+# @device: device ID of target block device
+# @rules: rules for setting, list of dictionary
+#
+# Since: 2.2
+##
+{ 'command': 'blkdebug-set-rules',
+  'data': { 'device': 'str', 'rules': [ 'dict' ] },
+  'gen': 'no'}
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4be4765..ef42cf0 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3755,3 +3755,21 @@ Example:
 <- { "return": {} }
 
 EQMP
+    {
+        .name       = "blkdebug-set-rules",
+        .args_type  = "device:s,rules:q",
+        .mhandler.cmd_new = qmp_blkdebug_set_rules,
+    },
+SQMP
+blkdebug-set-rules
+------------------
+
+Set blockdebug rules
+
+Example:
+-> {"execute": "blkdebug-set-rules", "arguments": {"device":
+   "ide0-hd0", "rules":[{"event": "write_aio", "type": "inject-error",
+   "immediately": 1, "once": 0, "state": 1}]}}
+<- { "return": {} }
+
+EQMP
-- 
1.8.3.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]