Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with sc

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with sc

From:	Eric Blake
Subject:	Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del
Date:	Mon, 6 Aug 2018 17:04:28 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 06/18/2018 11:44 AM, Kevin Wolf wrote:

From: Greg Kurz <address@hidden>

Removing a drive with drive_del while it is being used to run an I/O
intensive workload can cause QEMU to crash.

...

The problem is that we should avoid making block driver graph
changes while we have in-flight requests. Let's drain all I/O
for this BB before calling bdrv_root_unref_child().

Signed-off-by: Greg Kurz <address@hidden>
Signed-off-by: Kevin Wolf <address@hidden>
---
  block/block-backend.c | 5 +++++
  1 file changed, 5 insertions(+)

Interestingly enough, git bisect is reliably pointing to this commit asthe culprit in the changed output of iotests './check -nbd 83':


083 8s ... - output mismatch (see 083.out.bad)

--- /home/eblake/qemu/tests/qemu-iotests/083.out 2018-02-2611:40:31.605792220 -0600+++ /home/eblake/qemu/tests/qemu-iotests/083.out.bad 2018-08-0616:57:03.411861660 -0500

@@ -41,7 +41,6 @@

 === Check disconnect after neg2 ===

-Connection closed
 read failed: Input/output error

 === Check disconnect 8 neg2 ===
@@ -54,7 +53,6 @@

 === Check disconnect before request ===

-Connection closed
 read failed: Input/output error

 === Check disconnect after request ===
@@ -116,7 +114,6 @@

 === Check disconnect after neg-classic ===

-Connection closed
 read failed: Input/output error

 === Check disconnect before neg1 ===
Failures: 083
Failed 1 of 1 tests


diff --git a/block/block-backend.c b/block/block-backend.c
index 2d1a3463e8..6b75bca317 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -767,6 +767,11 @@ void blk_remove_bs(BlockBackend *blk)

blk_update_root_state(blk);+ /* bdrv_root_unref_child() will cause blk->root to become stale and may

+     * switch to a completion coroutine later on. Let's drain all I/O here
+     * to avoid that and a potential QEMU crash.
+     */
+    blk_drain(blk);
      bdrv_root_unref_child(blk->root);
      blk->root = NULL;
  }

Test 83 sets up a client that intentionally disconnects at criticalpoints in the NBD protocol exchange, to ensure that the server reactssanely. I suspect that somewhere in the NBD code, the server detectsthe disconnect and was somehow calling into blk_remove_bs() (although Icould not quickly find the backtrace); and that prior to this patch, the'Connection closed' message resulted from other NBD coroutines getting ashot at the (now-closed) connection, while after this patch, theadditional blk_drain() somehow tweaks things in a way that prevents theother NBD coroutines from printing a message. If so, then the change in83 reference output is probably intentional, and we should update it.

But I'm having a hard time convincing myself that this is the case,particularly since I'm not even sure how to easily debug the assumptionsI made above.

Since I'm very weak on the whole notion of what blk_drain() vs.blk_remove_bs() is really supposed to be doing, and could easily bepersuaded that the change in output is a regression instead of a fix.


Thoughts?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del, Eric Blake <=
- Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del, Eric Blake, 2018/08/07
  - Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del, Vladimir Sementsov-Ogievskiy, 2018/08/08
    - Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del, Vladimir Sementsov-Ogievskiy, 2018/08/08
    - Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del, Eric Blake, 2018/08/08
  - Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del, Vladimir Sementsov-Ogievskiy, 2018/08/08
    - Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del, Eric Blake, 2018/08/08

Prev by Date: Re: [Qemu-block] [PATCH] qcow2: Release dirty entries with cache-clean-interval
Next by Date: Re: [Qemu-block] [RFC PATCH] async: Fix aio_notify_accept
Previous by thread: [Qemu-block] [PATCH] qcow2: Release dirty entries with cache-clean-interval
Next by thread: Re: [Qemu-block] [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del
Index(es):
- Date
- Thread