qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] aio-posix: honor is_external in AioContext poll


From: Fam Zheng
Subject: Re: [Qemu-devel] [PATCH] aio-posix: honor is_external in AioContext polling
Date: Tue, 24 Jan 2017 20:04:31 +0800
User-agent: Mutt/1.7.1 (2016-10-04)

On Tue, 01/24 09:53, Stefan Hajnoczi wrote:
> AioHandlers marked ->is_external must be skipped when aio_node_check()
> fails.  bdrv_drained_begin() needs this to prevent dataplane from
> submitting new I/O requests while another thread accesses the device and
> relies on it being quiesced.
> 
> This patch fixes the following segfault:
> 
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   #0  0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at 
> qemu/block/io.c:2650
>   2650            bdrv_io_plug(child->bs);
>   [Current thread is 1 (Thread 0x7ff5c4bd1c80 (LWP 10917))]
>   (gdb) bt
>   #0  0x00005577f6127dad in bdrv_io_plug (bs=0x5577f7ae52f0) at 
> qemu/block/io.c:2650
>   #1  0x00005577f6114363 in blk_io_plug (blk=0x5577f7b8ba20) at 
> qemu/block/block-backend.c:1561
>   #2  0x00005577f5d4091d in virtio_blk_handle_vq (s=0x5577f9ada030, 
> vq=0x5577f9b3d2a0) at qemu/hw/block/virtio-blk.c:589
>   #3  0x00005577f5d4240d in virtio_blk_data_plane_handle_output 
> (vdev=0x5577f9ada030, vq=0x5577f9b3d2a0) at 
> qemu/hw/block/dataplane/virtio-blk.c:158
>   #4  0x00005577f5d88acd in virtio_queue_notify_aio_vq (vq=0x5577f9b3d2a0) at 
> qemu/hw/virtio/virtio.c:1304
>   #5  0x00005577f5d8aaaf in virtio_queue_host_notifier_aio_poll 
> (opaque=0x5577f9b3d308) at qemu/hw/virtio/virtio.c:2134
>   #6  0x00005577f60ca077 in run_poll_handlers_once (ctx=0x5577f79ddbb0) at 
> qemu/aio-posix.c:493
>   #7  0x00005577f60ca268 in try_poll_mode (ctx=0x5577f79ddbb0, blocking=true) 
> at qemu/aio-posix.c:569
>   #8  0x00005577f60ca331 in aio_poll (ctx=0x5577f79ddbb0, blocking=true) at 
> qemu/aio-posix.c:601
>   #9  0x00005577f612722a in bdrv_flush (bs=0x5577f7c20970) at 
> qemu/block/io.c:2403
>   #10 0x00005577f60c1b2d in bdrv_close (bs=0x5577f7c20970) at 
> qemu/block.c:2322
>   #11 0x00005577f60c20e7 in bdrv_delete (bs=0x5577f7c20970) at 
> qemu/block.c:2465
>   #12 0x00005577f60c3ecf in bdrv_unref (bs=0x5577f7c20970) at 
> qemu/block.c:3425
>   #13 0x00005577f60bf951 in bdrv_root_unref_child (child=0x5577f7a2de70) at 
> qemu/block.c:1361
>   #14 0x00005577f6112162 in blk_remove_bs (blk=0x5577f7b8ba20) at 
> qemu/block/block-backend.c:491
>   #15 0x00005577f6111b1b in blk_remove_all_bs () at 
> qemu/block/block-backend.c:245
>   #16 0x00005577f60c1db6 in bdrv_close_all () at qemu/block.c:2382
>   #17 0x00005577f5e60cca in main (argc=20, argv=0x7ffea6eb8398, 
> envp=0x7ffea6eb8440) at qemu/vl.c:4684
> 
> The key thing is that bdrv_close() uses bdrv_drained_begin() and
> virtio_queue_host_notifier_aio_poll() must not be called.
> 
> Thanks to Fam Zheng <address@hidden> for identifying the root cause of
> this crash.
> 
> Reported-by: Alberto Garcia <address@hidden>
> Signed-off-by: Stefan Hajnoczi <address@hidden>
> ---
>  aio-posix.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/aio-posix.c b/aio-posix.c
> index 9453d83..a8d7090 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -508,7 +508,8 @@ static bool run_poll_handlers_once(AioContext *ctx)
>  
>      QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
>          if (!node->deleted && node->io_poll &&
> -                node->io_poll(node->opaque)) {
> +            aio_node_check(ctx, node->is_external) &&
> +            node->io_poll(node->opaque)) {
>              progress = true;
>          }
>  
> -- 
> 2.9.3
> 
> 

The patch is not wrong and I believe it is enough to fix the crash, however it's
not enough...

All in all I think we should skip external handlers regardless of
aio_disable_external(), or even skip try_poll_mode, in nested aio_poll()'s. The
reasons are 1) many nested aio_poll()'s don't have bdrv_drained_begin, so this
check is not sufficient; 2) aio_poll() on qemu_aio_context doesn't look at
ioeventfd before, but this was changed by adding try_poll_mode(), which is not
very correct.

These two factors combined together make it possible for bdrv_flush() etc to
spin longer than necessary, if not forever, when the guest keeps submitting more
requests with ioeventfd.

Fam



reply via email to

[Prev in Thread] Current Thread [Next in Thread]