qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] block: Fix bdrv_drain in coroutine


From: Laurent Vivier
Subject: Re: [Qemu-devel] [PATCH v2] block: Fix bdrv_drain in coroutine
Date: Fri, 1 Apr 2016 16:14:32 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1


On 01/04/2016 15:57, Fam Zheng wrote:
> Using the nested aio_poll() in coroutine is a bad idea. This patch
> replaces the aio_poll loop in bdrv_drain with a BH, if called in
> coroutine.
> 
> For example, the bdrv_drain() in mirror.c can hang when a guest issued
> request is pending on it in qemu_co_mutex_lock().
> 
> Mirror coroutine in this case has just finished a request, and the block
> job is about to complete. It calls bdrv_drain() which waits for the
> other coroutine to complete. The other coroutine is a scsi-disk request.
> The deadlock happens when the latter is in turn pending on the former to
> yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
> (assuming a qcow2 image):
> 
>   mirror coroutine               scsi-disk coroutine
>   -------------------------------------------------------------
>   do last write
> 
>     qcow2:qemu_co_mutex_lock()
>     ...
>                                  scsi disk read
> 
>                                    tracked request begin
> 
>                                    qcow2:qemu_co_mutex_lock.enter
> 
>     qcow2:qemu_co_mutex_unlock()
> 
>   bdrv_drain
>     while (has tracked request)
>       aio_poll()
> 
> In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
> because the mirror coroutine is blocked in the aio_poll(blocking=true).
> 
> With this patch, the added qemu_coroutine_yield() allows the scsi-disk
> coroutine to make progress as expected:
> 
>   mirror coroutine               scsi-disk coroutine
>   -------------------------------------------------------------
>   do last write
> 
>     qcow2:qemu_co_mutex_lock()
>     ...
>                                  scsi disk read
> 
>                                    tracked request begin
> 
>                                    qcow2:qemu_co_mutex_lock.enter
> 
>     qcow2:qemu_co_mutex_unlock()
> 
>   bdrv_drain.enter
>>   schedule BH
>>   qemu_coroutine_yield()
>>                                  qcow2:qemu_co_mutex_lock.return
>>                                  ...
>                                    tracked request end
>     ...
>     (resumed from BH callback)
>   bdrv_drain.return
>   ...
> 
> Reported-by: Laurent Vivier <address@hidden>
> Suggested-by: Paolo Bonzini <address@hidden>
> Signed-off-by: Fam Zheng <address@hidden>

Tested-by: Laurent Vivier <address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]