[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-block] [PULL 35/42] blockjob: Lie better in child_job_drained_poll
From: |
Max Reitz |
Subject: |
[Qemu-block] [PULL 35/42] blockjob: Lie better in child_job_drained_poll() |
Date: |
Tue, 25 Sep 2018 17:15:34 +0200 |
From: Kevin Wolf <address@hidden>
Block jobs claim in .drained_poll() that they are in a quiescent state
as soon as job->deferred_to_main_loop is true. This is obviously wrong,
they still have a completion BH to run. We only get away with this
because commit 91af091f923 added an unconditional aio_poll(false) to the
drain functions, but this is bypassing the regular drain mechanisms.
However, just removing this and telling that the job is still active
doesn't work either: The completion callbacks themselves call drain
functions (directly, or indirectly with bdrv_reopen), so they would
deadlock then.
As a better lie, tell that the job is active as long as the BH is
pending, but falsely call it quiescent from the point in the BH when the
completion callback is called. At this point, nested drain calls won't
deadlock because they ignore the job, and outer drains will wait for the
job to really reach a quiescent state because the callback is already
running.
Signed-off-by: Kevin Wolf <address@hidden>
Reviewed-by: Max Reitz <address@hidden>
---
include/qemu/job.h | 3 +++
blockjob.c | 2 +-
job.c | 11 ++++++++++-
3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 63c60ef1ba..9e7cd1e4a0 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -76,6 +76,9 @@ typedef struct Job {
* Set to false by the job while the coroutine has yielded and may be
* re-entered by job_enter(). There may still be I/O or event loop activity
* pending. Accessed under block_job_mutex (in blockjob.c).
+ *
+ * When the job is deferred to the main loop, busy is true as long as the
+ * bottom half is still pending.
*/
bool busy;
diff --git a/blockjob.c b/blockjob.c
index 58dbd87a51..4d5342259c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -164,7 +164,7 @@ static bool child_job_drained_poll(BdrvChild *c)
/* An inactive or completed job doesn't have any pending requests. Jobs
* with !job->busy are either already paused or have a pause point after
* being reentered, so no job driver code will run before they pause. */
- if (!job->busy || job_is_completed(job) || job->deferred_to_main_loop) {
+ if (!job->busy || job_is_completed(job)) {
return false;
}
diff --git a/job.c b/job.c
index 7ec8c3b969..518f603314 100644
--- a/job.c
+++ b/job.c
@@ -857,7 +857,16 @@ static void job_exit(void *opaque)
AioContext *ctx = job->aio_context;
aio_context_acquire(ctx);
+
+ /* This is a lie, we're not quiescent, but still doing the completion
+ * callbacks. However, completion callbacks tend to involve operations that
+ * drain block nodes, and if .drained_poll still returned true, we would
+ * deadlock. */
+ job->busy = false;
+ job_event_idle(job);
+
job_completed(job);
+
aio_context_release(ctx);
}
@@ -872,8 +881,8 @@ static void coroutine_fn job_co_entry(void *opaque)
assert(job && job->driver && job->driver->run);
job_pause_point(job);
job->ret = job->driver->run(job, &job->err);
- job_event_idle(job);
job->deferred_to_main_loop = true;
+ job->busy = true;
aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
}
--
2.17.1
- [Qemu-block] [PULL 23/42] job: Fix nested aio_poll() hanging in job_txn_apply, (continued)
- [Qemu-block] [PULL 23/42] job: Fix nested aio_poll() hanging in job_txn_apply, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 26/42] aio-wait: Increase num_waiters even in home thread, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 27/42] test-bdrv-drain: Drain with block jobs in an I/O thread, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 28/42] test-blockjob: Acquire AioContext around job_cancel_sync(), Max Reitz, 2018/09/25
- [Qemu-block] [PULL 29/42] job: Use AIO_WAIT_WHILE() in job_finish_sync(), Max Reitz, 2018/09/25
- [Qemu-block] [PULL 30/42] test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 31/42] block: Add missing locking in bdrv_co_drain_bh_cb(), Max Reitz, 2018/09/25
- [Qemu-block] [PULL 32/42] block-backend: Add .drained_poll callback, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 34/42] block-backend: Decrease in_flight only after callback, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 33/42] block-backend: Fix potential double blk_delete(), Max Reitz, 2018/09/25
- [Qemu-block] [PULL 35/42] blockjob: Lie better in child_job_drained_poll(),
Max Reitz <=
- [Qemu-block] [PULL 36/42] block: Remove aio_poll() in bdrv_drain_poll variants, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 38/42] job: Avoid deadlocks in job_completed_txn_abort(), Max Reitz, 2018/09/25
- [Qemu-block] [PULL 37/42] test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level(), Max Reitz, 2018/09/25
- [Qemu-block] [PULL 39/42] test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 40/42] test-bdrv-drain: Fix outdated comments, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 41/42] block: Use a single global AioWait, Max Reitz, 2018/09/25
- [Qemu-block] [PULL 42/42] test-bdrv-drain: Test draining job source child and parent, Max Reitz, 2018/09/25
- Re: [Qemu-block] [PULL 00/42] Block patches, Peter Maydell, 2018/09/25