[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH 14/20] qcow2: Factor out count_cow_clusters |
Date: |
Fri, 20 Apr 2012 15:15:20 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 |
Am 20.04.2012 15:06, schrieb Stefan Hajnoczi:
> On Fri, Apr 20, 2012 at 1:27 PM, Stefan Hajnoczi <address@hidden> wrote:
>> On Fri, Apr 20, 2012 at 9:24 AM, Kevin Wolf <address@hidden> wrote:
>>> Am 19.04.2012 23:18, schrieb Marcelo Tosatti:
>>>> On Thu, Apr 19, 2012 at 05:14:20PM -0300, Marcelo Tosatti wrote:
>>>>>> There is one intended change in functionality in this patch, which is
>>>>>> that it allocates new clusters even when it could satisfy the first part
>>>>>> of the request with already allocated clusters. In order to check if
>>>>>> there is a problem with this scenario, the following patch should revert
>>>>>> to the old behaviour:
>>>>>>
>>>>>> --- a/block/qcow2-cluster.c
>>>>>> +++ b/block/qcow2-cluster.c
>>>>>> @@ -847,7 +847,7 @@ again:
>>>>>> keep_clusters = count_contiguous_clusters(nb_clusters,
>>>>>> s->cluster_size,
>>>>>> &l2_table[l2_index],
>>>>>> 0, 0);
>>>>>> assert(keep_clusters <= nb_clusters);
>>>>>> - nb_clusters -= keep_clusters;
>>>>>> + nb_clusters = 0;
>>>>>> } else {
>>>>>> /* For the moment, overwrite compressed clusters one by one */
>>>>>> if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
>>>>>>
>>>>>> The rest is meant to be a functionally equivalent rewrite of the old
>>>>>> code that was required in order to allow this change.
>>>>>
>>>>> Testing.
>>>>
>>>> Corruption gone with patch above.
>>>
>>> Okay, so something must be wrong only with the new code paths, which
>>> makes things a bit easier. I'll have another closer look. Stefan, can
>>> you re-review 250196f1 as well?
>>
>> I'm taking a look.
>
> Just looking at the qemu-img check output that Marcelo posted I'm
> interpreting that as OFLAG_COPIED was set on the offset but its
> refcount was 2.
>
> The refcount shouldn't be 2 unless internal snapshots were used.
>
> So we probably incremented the refcount either too often or for the
> wrong block (which is more likely since the guest sees corruption).
I've looked at the same thing now and I think the same cluster was used
both for a regular data allocation and for a new refcount block. This
may happen because alloc_refcount_block() uses s->free_cluster_index
which is updated by qcow2_alloc_cluster_noref(), but not by
qcow2_alloc_cluster_at(). Just a theory so far, though, and not yet
tried out in practice. If this is it, a patch would look like this
(completely untested as well):
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -587,6 +587,7 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs,
uint64_t offset,
{
BDRVQcowState *s = bs->opaque;
uint64_t cluster_index;
+ uint64_t old_free_cluster_index;
int i, refcount, ret;
/* Check how many clusters there are free */
@@ -602,11 +603,16 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs,
uint64_t offset,
}
/* And then allocate them */
+ old_free_cluster_index = s->free_cluster_index;
+ s->free_cluster_index = cluster_index + i;
+
ret = update_refcount(bs, offset, i << s->cluster_bits, 1);
if (ret < 0) {
return ret;
}
+ s->free_cluster_index = old_free_cluster_index;
+
return i;
}
Kevin