qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re:Re: [PATCH] blockjob: Fix crash with IOthread when block commit after


From: Michael Qiu
Subject: Re:Re: [PATCH] blockjob: Fix crash with IOthread when block commit after snapshot
Date: Thu, 4 Feb 2021 16:38:03 +0800 (CST)

Hi, Kevin

Any comments about this patch? 
The lock release action is added by the commit 132ada80 "block: Adjust AioContexts when attaching nodes"

My patch is to avoid some crash case., and indeed touch the code about that commit.

Thanks,
Michael
At 2021-02-03 15:45:07, "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com> wrote: >subject should start with [PATCH v5] > >03.02.2021 05:40, 08005325@163.com wrote: >> From: Michael Qiu <qiudayu@huayun.com> >> >> v5: reformat the commit log with backtrace of main thread >> Add a boolean variable to make main thread could re-acquire >> aio_context on success path. >> >> v4: rebase to latest code >> >> v3: reformat the commit log, remove duplicate content > >patch history shouldn't go into commit message. So you should place it under '---' [*], after calling git format-patch > >> >> Currently, if guest has workloads, IO thread will acquire aio_context >> lock before do io_submit, it leads to segmentfault when do block commit >> after snapshot. Just like below: >> >> Program received signal SIGSEGV, Segmentation fault. >> >> [Switching to Thread 0x7f7c7d91f700 (LWP 99907)] >> 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437 >> 1437 ../block/mirror.c: No such file or directory. >> (gdb) p s->job >> $17 = (MirrorBlockJob *) 0x0 >> (gdb) p s->stop >> $18 = false >> >> Call trace of IO thread: >> 0 0x00005576d0f65aab in bdrv_mirror_top_pwritev at ../block/mirror.c:1437 >> 1 0x00005576d0f7f3ab in bdrv_driver_pwritev at ../block/io.c:1174 >> 2 0x00005576d0f8139d in bdrv_aligned_pwritev at ../block/io.c:1988 >> 3 0x00005576d0f81b65 in bdrv_co_pwritev_part at ../block/io.c:2156 >> 4 0x00005576d0f8e6b7 in blk_do_pwritev_part at ../block/block-backend.c:1260 >> 5 0x00005576d0f8e84d in blk_aio_write_entry at ../block/block-backend.c:1476 >> ... >> >> Switch to qemu main thread: >> 0 0x00007f903be704ed in __lll_lock_wait at >> /lib/../lib64/libpthread.so.0 >> 1 0x00007f903be6bde6 in _L_lock_941 at /lib/../lib64/libpthread.so.0 >> 2 0x00007f903be6bcdf in pthread_mutex_lock at >> /lib/../lib64/libpthread.so.0 >> 3 0x0000564b21456889 in qemu_mutex_lock_impl at >> ../util/qemu-thread-posix.c:79 >> 4 0x0000564b213af8a5 in block_job_add_bdrv at ../blockjob.c:224 >> 5 0x0000564b213b00ad in block_job_create at ../blockjob.c:440 >> 6 0x0000564b21357c0a in mirror_start_job at ../block/mirror.c:1622 >> 7 0x0000564b2135a9af in commit_active_start at ../block/mirror.c:1867 >> 8 0x0000564b2133d132 in qmp_block_commit at ../blockdev.c:2768 >> 9 0x0000564b2141fef3 in qmp_marshal_block_commit at >> qapi/qapi-commands-block-core.c:346 >> 10 0x0000564b214503c9 in do_qmp_dispatch_bh at >> ../qapi/qmp-dispatch.c:110 >> 11 0x0000564b21451996 in aio_bh_poll at ../util/async.c:164 >> 12 0x0000564b2146018e in aio_dispatch at ../util/aio-posix.c:381 >> 13 0x0000564b2145187e in aio_ctx_dispatch at ../util/async.c:306 >> 14 0x00007f9040239049 in g_main_context_dispatch at >> /lib/../lib64/libglib-2.0.so.0 >> 15 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:232 >> 16 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:255 >> 17 0x0000564b21447368 in main_loop_wait at ../util/main-loop.c:531 >> 18 0x0000564b212304e1 in qemu_main_loop at ../softmmu/runstate.c:721 >> 19 0x0000564b20f7975e in main at ../softmmu/main.c:50 >> >> In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field >> is false, this means the MirrorBDSOpaque "s" object has not been initialized >> yet, and this object is initialized by block_job_create(), but the initialize >> process is stuck in acquiring the lock. >> >> In this situation, IO thread come to bdrv_mirror_top_pwritev(),which means that >> mirror-top node is already inserted into block graph, but its bs->opaque->job >> is not initialized. >> >> The root cause is that qemu main thread do release/acquire when hold the lock, >> at the same time, IO thread get the lock after release stage, and the crash >> occured. >> >> Actually, in this situation, job->job.aio_context will not equal to >> qemu_get_aio_context(), and will be the same as bs->aio_context, >> thus, no need to release the lock, becasue bdrv_root_attach_child() >> will not change the context. >> >> This patch fix this issue. >> >> Fixes: 132ada80 "block: Adjust AioContexts when attaching nodes" >> >> Signed-off-by: Michael Qiu <qiudayu@huayun.com> > >I feel like there may be more problems (like the fact that drained section should be expanded, and >that expanding doesn't help as Michael said), but I think that temporary releasing locks is unsafe >thing, and if we can avoid it for some cases it's good, especially if it fixes some bug: > >Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > >> --- > >[*] patch history and anything that you don't want to put into final commit message goes here. > >> blockjob.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/blockjob.c b/blockjob.c >> index db3a21699c..d9dca36f65 100644 >> --- a/blockjob.c >> +++ b/blockjob.c >> @@ -212,15 +212,21 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs, >> uint64_t perm, uint64_t shared_perm, Error **errp) >> { >> BdrvChild *c; >> + bool need_context_ops; >> >> bdrv_ref(bs); >> - if (job->job.aio_context != qemu_get_aio_context()) { >> + >> + need_context_ops = bdrv_get_aio_context(bs) != job->job.aio_context; > >I'd also put the second condition into same variable, just for less typing. Still it should work as is. > >> + >> + if (need_context_ops && >> + job->job.aio_context != qemu_get_aio_context()) { >> aio_context_release(job->job.aio_context); >> } >> c = bdrv_root_attach_child(bs, name, &child_job, 0, >> job->job.aio_context, perm, shared_perm, job, >> errp); >> - if (job->job.aio_context != qemu_get_aio_context()) { >> + if (need_context_ops && >> + job->job.aio_context != qemu_get_aio_context()) { >> aio_context_acquire(job->job.aio_context); >> } >> if (c == NULL) { >> > > >-- >Best regards, >Vladimir


 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]