qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_of


From: Andreas Färber
Subject: Re: [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_offset
Date: Mon, 15 Oct 2012 16:28:37 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120825 Thunderbird/15.0

Am 15.10.2012 11:13, schrieb Kevin Wolf:
> Am 12.10.2012 17:52, schrieb Andreas Färber:
>> Am 12.06.2012 15:44, schrieb Kevin Wolf:
>>> Am 12.06.2012 15:33, schrieb Andreas Färber:
>>>> Am 14.05.2012 14:20, schrieb Kevin Wolf:
>>>>> Am 13.05.2012 10:03, schrieb Zhouyi Zhou:
>>>>>>   sometimes, qemu/kvm-0.1x will hang in endless loop in 
>>>>>> qcow2_alloc_cluster_offset.
>>>>>
>>>>> The patch looks reasonable to me. Note however that while it fixes the
>>>>> hang, it still causes cluster leaks. I'm not sure if someone is
>>>>> interested in picking these up for old stable releases. Andreas, I think
>>>>> you were going to take 0.15? The first version that doesn't have the
>>>>> problem is 1.0.
>>>>
>>> It's "fixed" as a side effect of the block layer conversion to
>>> coroutines. Not exactly the kind of patches you'd want to cherry-pick
>>> for stable-0.15.
>>>
>>> The better fix for 0.15 could be to backport the new behaviour of
>>> coroutine based requests with bdrv_aio_cancel:
>>>
>>> static void bdrv_aio_co_cancel_em(BlockDriverAIOCB *blockacb)
>>> {
>>>     qemu_aio_flush();
>>> }
>>>
>>> Using that as the implementation for qcow2_aio_cancel should be safe and
>>> fix this problem.
>>
>> [...] stable-0.15 does not have coroutines, so I don't
>> understand what exactly you're suggesting as alternative here: Backport
>> the whole coroutine feature including coroutine function above? Or just
>> call qemu_aio_flush() in place of what? This is old qcow2_aio_cancel():
> 
> No, that was qcow2_aio_flush. ;-)

Ugh, what a copy-and-paste error... ;-)

> What I'm suggesting (not even compile tested!) is:
> 
> Signed-off-by: Kevin Wolf <address@hidden>
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 48e1b95..d665675 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -388,10 +388,7 @@ typedef struct QCowAIOCB {
> 
>  static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb)
>  {
> -    QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common);
> -    if (acb->hd_aiocb)
> -        bdrv_aio_cancel(acb->hd_aiocb);
> -    qemu_aio_release(acb);
> +    qemu_aio_flush();
>  }
> 
>  static AIOPool qcow2_aio_pool = {

Compiles fine. Is there a particular test case to invoke this code path?

Does this attempt to fix the cluster leaks you mentioned as well, or
just the cluster allocation endless loop?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]