A write on a qcow2 image that results in the allocation of a new cluster can be
divided into two parts: There is a part which happens before the actual data is
written, this is about allocating clusters in the image file. The second part
happens in the AIO callback handler and is about making L2 entries for the
newly allocated clusters.
When two AIO requests happen to touch the same free cluster, there is a chance
that the second request still sees the cluster as free because the callback of
the first request has not yet run. This means that it reserves another cluster
for the same virtual hard disk offset and hooks it up in its own callback,
overwriting what the first callback has done. Long story cut short: Bad Things
happen.
This patch maintains a list of in-flight requests that have allocated new
clusters. A second request touching the same cluster doesn't find an entry yet
in the L2 table, but can look it up in the list now. The second request is
limited so that it either doesn't touch the allocation of the first request
(so it can have a non-overlapping allocation) or that it doesn't exceed the
end of the allocation of the first request (so it can reuse this allocation
and doesn't need to do anything itself).
+ } else {
+ /* Reuse the previously allocated clusters */
+ if (end_offset > old_end_offset) {
+ nb_clusters = (old_end_offset - offset) >> s->cluster_bits;
+ }
+ cluster_offset = old_alloc->cluster_offset + (offset - old_offset);
+ m->nb_clusters = 0;
+ goto out;
+ }
+ }
+
+ LIST_INSERT_HEAD(&s->cluster_allocs, m, next);