[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v5 5/6] qcow2: consider in-flight-writes when freeing clusters
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
[PATCH v5 5/6] qcow2: consider in-flight-writes when freeing clusters |
Date: |
Fri, 26 Mar 2021 23:00:44 +0300 |
We have a bug in qcow2: assume we've started data write into host
cluster A. s->lock is unlocked. During the write the refcount of
cluster A may become zero, cluster may be reallocated for other needs,
and our in-flight write become a use-after-free.
To fix the bug let's do the following. Or better, let's start from what
we have now:
Now we consider cluster "free", when its refcount is 0. When cluster
becomes "free" we also update s->free_cluster_index and optionally
discard it on bs->file level. These two operations are done in same
s->lock critical section where refcount becomes 0 (and this all is in
update_refcount()). Calling update_refcount() is wrong. It's ofcourse
done sometimes, as not everything is moved to coroutine for now..
Still, it's out of our topic.
Later, we can reallocate "free" cluster in alloc_clusters_noref() and
qcow2_alloc_clusters_at(), where is_cluster_free() is used.
OK, to correctly handle in-flight writes, let's modify a concept of
"free" cluster, so that cluster is "free" when its refcount is 0 and
there no inflight writes.
So, we discard the cluster at bs->file level, update
s->free_cluster_index and allow reallocation only when both refcount
and inflight-write-cnt becomes both zero. It may happen either in
update_refcount() or in update_inflight_write_cnt().
Consider update_refcount() first. Here we update refcounts metadata we
must be under s->lock. So, if we catch a situation when refcount
becomes 0 but there are in-flight writes we change a behavior so that
we don't update s->free_cluster_index, and do not discard the cluster.
This will be done by update_inflight_write_cnt() later. So, we save
needed information into Qcow2InFlightWriteCounter structure, so that
further update_inflight_write_cnt() do not need to load the refcount.
Now what about update_inflight_write_cnt()? Here things become more
interesting, because we can avoid s->lock. If
inflight-write-cnt + refcount is still positive, we don't have any
yield point, just update inflight write counter and we are done.
If inflight-write-cnt becomes 0 and refcount is already 0, we just need
to keep inflight-write-counter=1 during the discard operation (if it
is needed) and then drop the counter and update s->free_cluster_index.
Note, that at this point we only implement the whole infrastructure for
in-flight writes handling. Nobody now call qcow2_inflight_writes_inc()
or qcow2_inflight_writes_dec(). It's a deal of the following patch.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
block/qcow2-refcount.c | 70 ++++++++++++++++++++++++++++++++++++++----
1 file changed, 64 insertions(+), 6 deletions(-)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index eedc83ea4a..9a0d6570a5 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -805,6 +805,15 @@ typedef struct Qcow2InFlightWriteCounter {
* 0 the entry is removed from s->inflight_writes_counters.
*/
uint64_t inflight_writes_cnt;
+
+ /* For convenience, keep cluster_index here */
+ int64_t cluster_index;
+
+ /* Cluster refcount is known to be zero */
+ bool refcount_zero;
+
+ /* Cluster refcount was made zero with this discard-type */
+ enum qcow2_discard_type last_discard_type;
} Qcow2InFlightWriteCounter;
/* Find Qcow2InFlightWriteCounter corresponding to @cluster_index */
@@ -845,6 +854,7 @@ update_inflight_write_cnt(BlockDriverState *bs, int64_t
offset, int64_t length,
if (!decrease) {
if (!infl) {
infl = g_new0(Qcow2InFlightWriteCounter, 1);
+ infl->cluster_index = cluster_index;
g_hash_table_insert(s->inflight_writes_counters,
g_memdup(&cluster_index,
sizeof(cluster_index)), infl);
@@ -857,10 +867,40 @@ update_inflight_write_cnt(BlockDriverState *bs, int64_t
offset, int64_t length,
assert(infl);
assert(infl->inflight_writes_cnt >= 1);
- infl->inflight_writes_cnt--;
+ if (infl->inflight_writes_cnt > 1) {
+ infl->inflight_writes_cnt--;
+ continue;
+ }
- if (infl->inflight_writes_cnt == 0) {
+ if (!infl->refcount_zero) {
+ /*
+ * All in-flight writes are finished, but cluster is still in use,
+ * nothing to do now.
+ */
g_hash_table_remove(s->inflight_writes_counters, &cluster_index);
+ continue;
+ }
+
+ /*
+ * OK. At this point there no more refcounts and no more in-flight
+ * writes. Cluster is almost free. But we probably want to do one more
+ * in-flight operation: discard. So we keep inflight_writes_cnt = 1
+ * during the discard.
+ */
+ if (s->discard_passthrough[infl->last_discard_type]) {
+ int64_t cluster_offset = cluster_index << s->cluster_bits;
+ if (s->cache_discards) {
+ update_refcount_discard(bs, cluster_offset, s->cluster_size);
+ } else {
+ /* Discard is optional, ignore the return value */
+ bdrv_pdiscard(bs->file, cluster_offset, s->cluster_size);
+ }
+ }
+
+ g_hash_table_remove(s->inflight_writes_counters, &cluster_index);
+
+ if (cluster_index < s->free_cluster_index) {
+ s->free_cluster_index = cluster_index;
}
}
}
@@ -970,6 +1010,7 @@ static int QEMU_WARN_UNUSED_RESULT
update_refcount(BlockDriverState *bs,
s->set_refcount(refcount_block, block_index, refcount);
if (refcount == 0) {
+ Qcow2InFlightWriteCounter *infl;
void *table;
table = qcow2_cache_is_table_offset(s->refcount_block_cache,
@@ -986,8 +1027,24 @@ static int QEMU_WARN_UNUSED_RESULT
update_refcount(BlockDriverState *bs,
qcow2_cache_discard(s->l2_table_cache, table);
}
- if (s->discard_passthrough[type]) {
- update_refcount_discard(bs, cluster_offset, s->cluster_size);
+ infl = find_infl_wr(s, cluster_index);
+ if (infl) {
+ /*
+ * Refcount becomes zero, but there are still in-flight writes
+ * to the cluster. update_inflight_write_cnt() will take care
+ * of discarding the cluster and updating
s->free_cluster_index.
+ */
+ infl->refcount_zero = true;
+ infl->last_discard_type = type;
+ } else {
+ /* Refcount is zero and no in-fligth writes. Cluster is free.
*/
+ if (cluster_index < s->free_cluster_index) {
+ s->free_cluster_index = cluster_index;
+ }
+ if (s->discard_passthrough[type]) {
+ update_refcount_discard(bs, cluster_offset,
+ s->cluster_size);
+ }
}
}
}
@@ -1049,7 +1106,7 @@ int qcow2_update_cluster_refcount(BlockDriverState *bs,
/*
- * Cluster is free when its refcount is 0
+ * Cluster is free when its refcount is 0 and there is no in-flight writes
*
* Return < 0 if failed to get refcount
* 0 if cluster is not free
@@ -1057,6 +1114,7 @@ int qcow2_update_cluster_refcount(BlockDriverState *bs,
*/
static int is_cluster_free(BlockDriverState *bs, int64_t cluster_index)
{
+ BDRVQcow2State *s = bs->opaque;
int ret;
uint64_t refcount;
@@ -1065,7 +1123,7 @@ static int is_cluster_free(BlockDriverState *bs, int64_t
cluster_index)
return ret;
}
- return refcount == 0;
+ return refcount == 0 && !find_infl_wr(s, cluster_index);
}
/* return < 0 if error */
--
2.29.2
- [PATCH v5 0/6] qcow2: fix parallel rewrite and discard (lockless), Vladimir Sementsov-Ogievskiy, 2021/03/26
- [PATCH v5 1/6] iotests: add qcow2-discard-during-rewrite, Vladimir Sementsov-Ogievskiy, 2021/03/26
- [PATCH v5 3/6] qcow2: introduce is_cluster_free() helper, Vladimir Sementsov-Ogievskiy, 2021/03/26
- [PATCH v5 2/6] qcow2: fix cache discarding in update_refcount(), Vladimir Sementsov-Ogievskiy, 2021/03/26
- [PATCH v5 4/6] qcow2: introduce inflight-write-counters, Vladimir Sementsov-Ogievskiy, 2021/03/26
- [PATCH v5 5/6] qcow2: consider in-flight-writes when freeing clusters,
Vladimir Sementsov-Ogievskiy <=
- [PATCH v5 6/6] qcow2: do not discard host clusters during in-flight writes, Vladimir Sementsov-Ogievskiy, 2021/03/26