qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

答复: [PATCH] block: fix core for unlock not permitted


From: suruifeng (A)
Subject: 答复: [PATCH] block: fix core for unlock not permitted
Date: Tue, 12 Apr 2022 12:49:02 +0000

Hi,

The recurrence probability is extremely low. I have not reproduced this in the 
latest version.
However, after reviewing the latest code, we find that this also exists.

This is my understanding of the latest code, if there is a mistake in my 
understanding, please tell me.

bdrv_flush_all()        // bdrv_next() is a cyclic execution, and is shown in 
the order of execution for convenience.
        -> bdrv_next()  
                -> bdrv_ref(bs)         // ref mirror_top_bs, current ref=2
        -> bdrv_flush()   // generated by block-coroutine-wrapper.py
                -> bdrv_poll_co()
                        -> BDRV_POLL_WHILE()
                                -> mirror_exit_common()  // Triggered by I/O 
error on iothread context
                                        -> bdrv_unref(mirror_top_bs)    // 
unref mirror_top_bs,current ref=1
        
        -> bdrv_next()  
                -> bdrv_unref(old_bs)   // unref mirror_top_bs,current ref=0
                        -> bdrv_delete()
                                -> bdrv_close()
                                        -> bdrv_flush() // generated by 
block-coroutine-wrapper.py
                                                -> bdrv_poll_co()
                                                        -> BDRV_POLL_WHILE()
                                                                -> 
aio_context_release(ctx_);   // iothread_ctx is not acquire,so triggered the 
core

Suruifeng


-----邮件原件-----
发件人: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] 代表 Paolo Bonzini
发送时间: 2022年4月12日 19:05
收件人: suruifeng (A) <suruifeng1@huawei.com>; qemu-devel@nongnu.org
主题: Re: [PATCH] block: fix core for unlock not permitted

On 4/12/22 09:13, suruifeng via wrote:
> qemu coredump:
>    0x00007f9e7205c81b in raise () from /usr/lib64/libc.so.6
>    0x00007f9e7205db41 in abort () from /usr/lib64/libc.so.6
>    0x00007f9e71ddbe94 in error_exit (err=<optimized out>, 
> msg=msg@entry=0x7f9e71ec1b50 <__func__.20287> "qemu_mutex_unlock_impl")
>      at /usr/src/debug/qemu-4.1.0-170.x86_64/util/qemu-thread-posix.c:36
>    0x00007f9e71ddc61f in qemu_mutex_unlock_impl 
> (mutex=mutex@entry=0x5559850b0b90, file=file@entry=0x7f9e71ec0978 
> "/home/abuild/rpmbuild/BUILD/qemu-4.1.0/util/async.c",
>      line=line@entry=524) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/util/qemu-thread-posix.c:108
>    0x00007f9e71dd5bb5 in aio_context_release (ctx=ctx@entry=0x5559850b0b30) 
> at /usr/src/debug/qemu-4.1.0-170.x86_64/util/async.c:524
>    0x00007f9e70dfed28 in bdrv_flush (bs=bs@entry=0x5559851f0a20) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block/io.c:2778
>    0x00007f9e70e37f63 in bdrv_close (bs=bs@entry=0x5559851f0a20) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block.c:4025
>    0x00007f9e70e38193 in bdrv_delete (bs=0x5559851f0a20) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block.c:4271
>    0x00007f9e70e38225 in bdrv_unref (bs=<optimized out>) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block.c:5612
>    0x00007f9e70df9a92 in bdrv_next (it=it@entry=0x7ffc5e3547a0) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block/block-backend.c:576
>    0x00007f9e70dfee76 in bdrv_flush_all () at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block/io.c:2074
>    0x00007f9e71e3a08f in do_vm_stop (state=state@entry=RUN_STATE_SHUTDOWN, 
> send_stop=send_stop@entry=false) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/cpus.c:1140
>    0x00007f9e71e3a14c in vm_shutdown () at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/cpus.c:1151
> 
> During mirror job run, the VM is shutdown. During the shutdown, the mirror 
> job I/O error triggers mirror_exit_commom.
> In bdrv_flush_all(), bdrv_next() increase the ref to mirror_top_bs 
> first, and then bdrv_flush(bs) call BDRV_POLL_WHILE and executes 
> mirror_exit_common() decreases ref to mirror_top_bs, and finally bdrv_next() 
> decreases the ref to mirror_top_bs, resulting in release mirror_top_bs.
> 
> Let's fix this by adding aio_context_acquire() and aio_context_release() to 
> bdrv_next().

Hi,

is this reproducible with a more recent version of QEMU?  In particular, 
bdrv_next does not have bdrv_unref anymore.

Paolo

> Signed-off-by: suruifeng <suruifeng1@huawei.com>
> ---
>   block/block-backend.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c index 
> e0e1aff4b1..5ae745c0ab 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -593,6 +593,7 @@ BlockBackend *blk_next(BlockBackend *blk)
>   BlockDriverState *bdrv_next(BdrvNextIterator *it)
>   {
>       BlockDriverState *bs, *old_bs;
> +    AioContext *ctx = NULL;
>   
>       /* Must be called from the main loop */
>       assert(qemu_get_current_aio_context() == 
> qemu_get_aio_context()); @@ -613,11 +614,17 @@ BlockDriverState 
> *bdrv_next(BdrvNextIterator *it)
>           if (it->blk) {
>               blk_ref(it->blk);
>           }
> +     ctx = blk_get_aio_context(old_blk);
> +     aio_context_acquire(ctx);
>           blk_unref(old_blk);
> +     aio_context_release(ctx);
>   
>           if (bs) {
>               bdrv_ref(bs);
> +         ctx = bdrv_get_aio_context(old_bs);
> +         aio_context_acquire(ctx);
>               bdrv_unref(old_bs);
> +         aio_context_release(ctx);
>               return bs;
>           }
>           it->phase = BDRV_NEXT_MONITOR_OWNED; @@ -636,7 +643,10 @@ 
> BlockDriverState *bdrv_next(BdrvNextIterator *it)
>       if (bs) {
>           bdrv_ref(bs);
>       }
> +    ctx = bdrv_get_aio_context(old_bs);
> +    aio_context_acquire(ctx);
>       bdrv_unref(old_bs);
> +    aio_context_release(ctx);
>   
>       return bs;
>   }


reply via email to

[Prev in Thread] Current Thread [Next in Thread]