[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v3 00/19] Fix some jobs/drain/aio_poll related hangs
From: |
Kevin Wolf |
Subject: |
[Qemu-devel] [PATCH v3 00/19] Fix some jobs/drain/aio_poll related hangs |
Date: |
Thu, 20 Sep 2018 18:19:39 +0200 |
Especially the combination of iothreads, block jobs and drain tends to
lead to hangs currently. This series fixes a few of these bugs, although
there are more of them, to be addressed in separate patches.
The primary goal of this series is to fix the scenario from:
https://bugzilla.redhat.com/show_bug.cgi?id=1601212
A simplified reproducer of the reported problem looks like this (two concurrent
commit block jobs for disks in an iothread):
$qemu -qmp stdio \
-object iothread,id=iothread1 \
-device
virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6,iothread=iothread1 \
-drive
id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=hd0
\
-device scsi-hd,drive=drive_image1,id=image1,bootindex=1 \
-drive
id=drive_image2,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=hd1
\
-device scsi-hd,drive=drive_image2,id=image2,bootindex=2
{"execute":"qmp_capabilities"}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn1"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn11"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn111"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn2"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn22"}}
{"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn222"}}
{ "execute": "block-commit", "arguments": { "device":
"drive_image2","base":"sn2","backing-file":"sn2","top":"sn22"}}
{ "execute": "block-commit", "arguments": { "device":
"drive_image1","base":"sn1","backing-file":"sn1","top":"sn11"}}
{"execute":"quit"}
v3:
- Patch 3 ('aio-wait: Increase num_waiters even in home thread'):
Hoist atomic_inc/dec outside the if [Fam, Paolo]
- Patch 10 ('block-backend: Fix potential double blk_delete()'):
Assert in blk_unref() that drain doesn't resurrect the BB [Paolo]
- Patch 11 ('block-backend: Decrease in_flight only after callback'):
Removed bdrv_ref/unref pair [Paolo]
- v2 Patch 12 ('mirror: Fix potential use-after-free in active'):
Dropped. It just papered over another bug that is fixed later.
- v3 Patch 17 ('test-bdrv-drain: Fix outdated comments'):
New patch with comment improvements [Max]
- v3 Patch 18 ('block: Use a single global AioWait'):
v3 Patch 19 ('test-bdrv-drain: Test draining job source child and
parent'):
New patches to fix an additional hang that was caused by notifying the
wrong AioWait
v2:
- Rebased on top of mreitz/block (including fixes for new bugs: patch 1 and 16)
- Patch 12: Added missing bdrv_unref() calls in error path [Fam]
Kevin Wolf (19):
job: Fix missing locking due to mismerge
blockjob: Wake up BDS when job becomes idle
aio-wait: Increase num_waiters even in home thread
test-bdrv-drain: Drain with block jobs in an I/O thread
test-blockjob: Acquire AioContext around job_cancel_sync()
job: Use AIO_WAIT_WHILE() in job_finish_sync()
test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
block: Add missing locking in bdrv_co_drain_bh_cb()
block-backend: Add .drained_poll callback
block-backend: Fix potential double blk_delete()
block-backend: Decrease in_flight only after callback
blockjob: Lie better in child_job_drained_poll()
block: Remove aio_poll() in bdrv_drain_poll variants
test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
job: Avoid deadlocks in job_completed_txn_abort()
test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort
test-bdrv-drain: Fix outdated comments
block: Use a single global AioWait
test-bdrv-drain: Test draining job source child and parent
include/block/aio-wait.h | 17 ++-
include/block/block.h | 6 +-
include/block/block_int.h | 3 -
include/block/blockjob.h | 3 +
include/qemu/coroutine.h | 5 +
include/qemu/job.h | 12 ++
block.c | 5 -
block/block-backend.c | 31 +++--
block/io.c | 30 ++---
blockjob.c | 9 +-
job.c | 49 +++++---
tests/test-bdrv-drain.c | 292 +++++++++++++++++++++++++++++++++++++++++++---
tests/test-blockjob.c | 6 +
util/aio-wait.c | 11 +-
util/qemu-coroutine.c | 5 +
15 files changed, 402 insertions(+), 82 deletions(-)
--
2.13.6
- [Qemu-devel] [PATCH v3 00/19] Fix some jobs/drain/aio_poll related hangs,
Kevin Wolf <=
- [Qemu-devel] [PATCH v3 02/19] blockjob: Wake up BDS when job becomes idle, Kevin Wolf, 2018/09/20
- [Qemu-devel] [PATCH v3 01/19] job: Fix missing locking due to mismerge, Kevin Wolf, 2018/09/20
- [Qemu-devel] [PATCH v3 03/19] aio-wait: Increase num_waiters even in home thread, Kevin Wolf, 2018/09/20
- [Qemu-devel] [PATCH v3 05/19] test-blockjob: Acquire AioContext around job_cancel_sync(), Kevin Wolf, 2018/09/20
- [Qemu-devel] [PATCH v3 04/19] test-bdrv-drain: Drain with block jobs in an I/O thread, Kevin Wolf, 2018/09/20
- [Qemu-devel] [PATCH v3 06/19] job: Use AIO_WAIT_WHILE() in job_finish_sync(), Kevin Wolf, 2018/09/20
- [Qemu-devel] [PATCH v3 09/19] block-backend: Add .drained_poll callback, Kevin Wolf, 2018/09/20
- [Qemu-devel] [PATCH v3 11/19] block-backend: Decrease in_flight only after callback, Kevin Wolf, 2018/09/20
- [Qemu-devel] [PATCH v3 08/19] block: Add missing locking in bdrv_co_drain_bh_cb(), Kevin Wolf, 2018/09/20