qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_of


From: Kevin Wolf
Subject: Re: [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_offset
Date: Mon, 14 May 2012 14:20:47 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1

Am 13.05.2012 10:03, schrieb Zhouyi Zhou:
> hi all
>   
>   sometimes, qemu/kvm-0.1x will hang in endless loop in 
> qcow2_alloc_cluster_offset.
>   after some investigation, I found that:
>   in function posix_aio_process_queue(void *opaque)
> 440             ret = qemu_paio_error(acb);
> 441             if (ret == ECANCELED) {
> 442                 /* remove the request */
> 443                 *pacb = acb->next;
> 444                 qemu_aio_release(acb);
> 445                 result = 1;
> 446             } else if (ret != EINPROGRESS) {
>   in line 444 acb got released but acb->common.opaque does not.
> which will be released via guest OS via ide_dma_cancel which 
> will in term call qcow_aio_cancel which does not check its argument
> is in flight list or not.
>   The fix is as follows: (debian 6's qemu-kvm-0.12.5)
> #######################################
> --- block/qcow2.h~      2010-07-27 08:43:53.000000000 +0800
> +++ block/qcow2.h       2012-05-13 15:51:39.000000000 +0800
> @@ -143,6 +143,7 @@
>      QLIST_HEAD(QCowAioDependencies, QCowAIOCB) dependent_requests;
>  
>      QLIST_ENTRY(QCowL2Meta) next_in_flight;
> +    int inflight;       
>  } QCowL2Meta;
> --- block/qcow2.c~  2012-05-13 15:57:09.000000000 +0800
> +++ block/qcow2.c       2012-05-13 15:57:24.000000000 +0800
> @@ -349,6 +349,10 @@
>      QCowAIOCB *acb = (QCowAIOCB *)blockacb;
>      if (acb->hd_aiocb)
>          bdrv_aio_cancel(acb->hd_aiocb);
> +    if (acb->l2meta.inflight) {
> +        QLIST_REMOVE(&acb->l2meta, next_in_flight);
> +       acb->l2meta.inflight = 0;
> +    }
>      qemu_aio_release(acb);
>  }
>  
> @@ -506,6 +510,7 @@
>      acb->n = 0;
>      acb->cluster_offset = 0;
>      acb->l2meta.nb_clusters = 0;
> +    acb->l2meta.inflight = 0;
>      QLIST_INIT(&acb->l2meta.dependent_requests);
>      return acb;
>  }
> @@ -534,6 +539,7 @@
>      /* Take the request off the list of running requests */
>      if (m->nb_clusters != 0) {
>          QLIST_REMOVE(m, next_in_flight);
> +       m->inflight = 0;
>      }
>  
>      /*
> @@ -632,6 +638,7 @@
>  fail:
>      if (acb->l2meta.nb_clusters != 0) {
>          QLIST_REMOVE(&acb->l2meta, next_in_flight);
> +       acb->l2meta.inflight  = 0;
>      }
>  done:
>      if (acb->qiov->niov > 1)
> --- block/qcow2-cluster.c~      2010-07-27 08:43:53.000000000 +0800
> +++ block/qcow2-cluster.c       2012-05-13 15:53:53.000000000 +0800
> @@ -827,6 +827,7 @@
>      m->offset = offset;
>      m->n_start = n_start;
>      m->nb_clusters = nb_clusters;
> +    m->inflight = 1;
>  
>  out:
>      m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
> 
>  Thanks for investigation
> Zhouyi

The patch looks reasonable to me. Note however that while it fixes the
hang, it still causes cluster leaks. I'm not sure if someone is
interested in picking these up for old stable releases. Andreas, I think
you were going to take 0.15? The first version that doesn't have the
problem is 1.0.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]