qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to_drain()


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 3/3] block: Fix deadlock in bdrv_co_yield_to_drain()
Date: Tue, 8 Dec 2020 18:33:25 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1

03.12.2020 20:23, Kevin Wolf wrote:
If bdrv_co_yield_to_drain() is called for draining a block node that
runs in a different AioContext, it keeps that AioContext locked while it
yields and schedules a BH in the AioContext to do the actual drain.

As long as executing the BH is the very next thing the event loop of the

s/thing the event/thing in the event/

(I've reread several times to understand :)

node's AioContext, this actually happens to work, but when it tries to
execute something else that wants to take the AioContext lock, it will
deadlock. (In the bug report, this other thing is a virtio-scsi device
running virtio_scsi_data_plane_handle_cmd().)

Instead, always drop the AioContext lock across the yield and reacquire
it only when the coroutine is reentered. The BH needs to unconditionally
take the lock for itself now.

This fixes the 'block_resize' QMP command on a block node that runs in
an iothread.

Cc: qemu-stable@nongnu.org
Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

I don't feel myself good enough in aio contexts acquiring and switching, to see 
any side effects. At least I don't see any obvious mistakes, so my weak:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Note, I looked through the callers:

bdrv_do_drained_begin/end should be ok, as their normal usage is to start/end 
drained section under acquired aio context, so it seems correct to temporary 
release the context. Still I didn't check all drained sections in the code.

bdrv_drain_all_begin seems OK too (we just wait until everything is drained, 
not bad to temporary release the lock). Still I don't see any call of it from 
coroutine context.

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]