[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)
From: |
Fam Zheng |
Subject: |
Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30) |
Date: |
Thu, 9 Nov 2017 14:05:26 +0800 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
On Wed, 11/08 18:50, Anton Nefedov wrote:
>
>
> On 8/11/2017 5:45 PM, Alberto Garcia wrote:
> > On Tue 07 Nov 2017 05:19:41 PM CET, Anton Nefedov wrote:
> > > BlockBackend gets deleted by another job's stream_complete(), deferred
> > > to the main loop, so the fact that the job is put to sleep by
> > > bdrv_drain_all_begin() doesn't really stop it from execution.
> >
> > I was debugging this a bit, and the block_job_defer_to_main_loop() call
> > happens _after_ all jobs have been paused, so I think that when the BDS
> > is drained then stream_run() finishes the last iteration without
> > checking if it's paused.
> >
> > Without your patch (i.e. with a smaller STREAM_BUFFER_SIZE) then I
> > assume that the function would have to continue looping and
> > block_job_sleep_ns() would make the job coroutine yield, effectively
> > pausing the job and preventing the crash.
> >
> > I can fix the crash by adding block_job_pause_point(&s->common) at the
> > end of stream_run() (where the 'out' label is).
> >
> > I'm thinking that perhaps we should add the pause point directly to
> > block_job_defer_to_main_loop(), to prevent any block job from running
> > the exit function when it's paused.
> >
>
> Is it possible that the exit function is already deferred when the jobs
> are being paused? (even though it's at least less likely to happen)
>
> Then should we flush the bottom halves somehow in addition to putting
> the jobs to sleep? And also then it all probably has to happen before
> bdrv_reopen_queue()
Or we can stash away the BH temporarily during pause period:
---
diff --git a/blockjob.c b/blockjob.c
index 3a0c49137e..7058ff7ae1 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -127,6 +127,9 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
static void block_job_pause(BlockJob *job)
{
job->pause_count++;
+ if (job->defer_to_main_loop_bh) {
+ qemu_bh_cancel(job->defer_to_main_loop_bh);
+ }
}
static void block_job_resume(BlockJob *job)
@@ -136,6 +139,9 @@ static void block_job_resume(BlockJob *job)
if (job->pause_count) {
return;
}
+ if (job->defer_to_main_loop_bh) {
+ qemu_bh_schedule(job->defer_to_main_loop_bh);
+ }
block_job_enter(job);
}
@@ -900,6 +906,9 @@ static void block_job_defer_to_main_loop_bh(void *opaque)
/* Prevent race with block_job_defer_to_main_loop() */
aio_context_acquire(data->aio_context);
+ qemu_bh_delete(data->job->defer_to_main_loop_bh);
+ data->job->defer_to_main_loop_bh = NULL;
+
/* Fetch BDS AioContext again, in case it has changed */
aio_context = blk_get_aio_context(data->job->blk);
if (aio_context != data->aio_context) {
@@ -927,7 +936,9 @@ void block_job_defer_to_main_loop(BlockJob *job,
data->fn = fn;
data->opaque = opaque;
job->deferred_to_main_loop = true;
+ assert(!job->defer_to_main_loop_bh);
+ job->defer_to_main_loop_bh = aio_bh_new(qemu_get_aio_context(),
+ block_job_defer_to_main_loop_bh, data);
- aio_bh_schedule_oneshot(qemu_get_aio_context(),
- block_job_defer_to_main_loop_bh, data);
+ qemu_bh_schedule(job->defer_to_main_loop_bh);
}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 67c0968fa5..28d29b3be6 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -97,6 +97,11 @@ typedef struct BlockJob {
*/
bool deferred_to_main_loop;
+ /**
+ * The main loop BH for "defer to main loop" operation.
+ */
+ QEMUBH *defer_to_main_loop_bh;
+
/** Element of the list of block jobs */
QLIST_ENTRY(BlockJob) job_list;
- [Qemu-devel] segfault in parallel blockjobs (iotest 30), Anton Nefedov, 2017/11/07
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Alberto Garcia, 2017/11/08
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Alberto Garcia, 2017/11/08
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Alberto Garcia, 2017/11/09
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Fam Zheng, 2017/11/09
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Alberto Garcia, 2017/11/15
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Anton Nefedov, 2017/11/15
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Alberto Garcia, 2017/11/16
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Anton Nefedov, 2017/11/16
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Alberto Garcia, 2017/11/21
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Anton Nefedov, 2017/11/21
- Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30), Alberto Garcia, 2017/11/21