qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially dama


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs
Date: Tue, 21 Oct 2014 12:10:54 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

On 2014-10-21 at 09:52, Kevin Wolf wrote:
Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
If a referenced cluster has a refcount of 0, increasing its refcount may
result in clusters being allocated for the refcount structures. This may
overwrite the referenced cluster, therefore we cannot simply increase
the refcount then.

In such cases, we can either try to replicate all the refcount
operations solely for the check operation, basing the allocations on the
in-memory refcount table; or we can simply rebuild the whole refcount
structure based on the in-memory refcount table. Since the latter will
be much easier, do that.

To prepare for this, introduce a "rebuild" boolean which should be set
to true whenever a fix is rather dangerous or too complicated using the
current refcount structures. Another example for this is refcount blocks
being referenced more than once.

Signed-off-by: Max Reitz <address@hidden>
Reviewed-by: BenoƮt Canet <address@hidden>
@@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, 
BdrvCheckResult *res,
              fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
                  "cluster aligned; refcount table entry corrupted\n", i);
              res->corruptions++;
+            *rebuild = true;
              continue;
          }
@@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res, resize_fail:
                  res->corruptions++;
+                *rebuild = true;
                  fprintf(stderr, "ERROR could not resize image: %s\n",
                          strerror(-ret));
              } else {
Increasing res->corruptions in this patch looks right because you don't
actually rebuild anything yet. However, at the end of the series this
code still looks the same - shouldn't we somehow convert those the
corruptions caused by wrong refcounts into res->corruptions_fixed?

Hm... It seems that you do this by storing intermediate results in
qcow2_check_refcounts(), which assumes that compare_refcounts() finds
only refcount problems, and no other place can find any. This may be
true, but wouldn't it be more elegant to have a separate counter for
corruptions that will be fixed with a rebuild? We can use some
Qcow2CheckResult instead of BdrvCheckResult internally and add more
fields there.

Also, I don't think all "ERROR" messages correctly say "Repairing" again
at the end of the series.

I'll have a look on this, but this hunk for instance should not say "Repairing". It's an error and it is not directly repaired. I think the output "Rebuilding refcount structure" should be sufficient.

@@ -1624,40 +1511,10 @@ resize_fail:
              inc_refcounts(bs, res, *refcount_table, *nb_clusters,
                  offset, s->cluster_size);
              if ((*refcount_table)[cluster] != 1) {
-                fprintf(stderr, "%s refcount block %" PRId64
-                    " refcount=%d\n",
-                    fix & BDRV_FIX_ERRORS ? "Repairing" :
-                                            "ERROR",
-                    i, (*refcount_table)[cluster]);
-
-                if (fix & BDRV_FIX_ERRORS) {
-                    int64_t new_offset;
-
-                    new_offset = realloc_refcount_block(bs, i, offset);
-                    if (new_offset < 0) {
-                        res->corruptions++;
-                        continue;
-                    }
-
-                    /* update refcounts */
-                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
-                        /* increase refcount_table size if necessary */
-                        int old_nb_clusters = *nb_clusters;
-                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
-                        *refcount_table = g_renew(uint16_t, *refcount_table,
-                                                  *nb_clusters);
-                        memset(&(*refcount_table)[old_nb_clusters], 0,
-                               (*nb_clusters - old_nb_clusters) *
-                               sizeof(uint16_t));
-                    }
-                    (*refcount_table)[cluster]--;
-                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-                            new_offset, s->cluster_size);
-
-                    res->corruptions_fixed++;
-                } else {
-                    res->corruptions++;
-                }
+                fprintf(stderr, "ERROR refcount block %" PRId64
+                        " refcount=%d\n", i, (*refcount_table)[cluster]);
+                res->corruptions++;
+                *rebuild = true;
              }
          }
      }
@@ -1669,8 +1526,8 @@ resize_fail:
   * Calculates an in-memory refcount table.
   */
  static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                               BdrvCheckMode fix, uint16_t **refcount_table,
-                               int64_t *nb_clusters)
+                               BdrvCheckMode fix, bool *rebuild,
+                               uint16_t **refcount_table, int64_t *nb_clusters)
  {
      BDRVQcowState *s = bs->opaque;
      int64_t i;
@@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
          s->refcount_table_offset,
          s->refcount_table_size * sizeof(uint64_t));
- return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
+    return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters);
  }
/*
@@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
   * refcount as reported by the refcount structures on-disk.
   */
  static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                              BdrvCheckMode fix, int64_t *highest_cluster,
+                              BdrvCheckMode fix, bool *rebuild,
+                              int64_t *highest_cluster,
                                uint16_t *refcount_table, int64_t nb_clusters)
  {
      BDRVQcowState *s = bs->opaque;
@@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
              int *num_fixed = NULL;
              if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
                  num_fixed = &res->leaks_fixed;
-            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
+            } else if (refcount1 > 0 && refcount1 < refcount2 &&
+                       (fix & BDRV_FIX_ERRORS)) {
                  num_fixed = &res->corruptions_fixed;
              }
+ if (refcount1 == 0) {
+                *rebuild = true;
+            }
Why isn't this just another else if branch? Possibly even as the first
or second condition, so that you don't have to add 'refcount1 > 0' for
the other branch.

Because it's semantically different. The if-else sets the num_fixed pointer, whereas this conditional block forces a rebuild. I can include it as the second condition, just at the time I liked it more to have it externally.

Now that I'm reading once again over it... Probably using it as the second condition is better. I'll see to it.

+
              fprintf(stderr, "%s cluster %" PRId64 " refcount=%d 
reference=%d\n",
                     num_fixed != NULL     ? "Repairing" :
                     refcount1 < refcount2 ? "ERROR" :
@@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
      BDRVQcowState *s = bs->opaque;
      int64_t size, highest_cluster, nb_clusters;
      uint16_t *refcount_table = NULL;
+    bool rebuild = false;
      int ret;
size = bdrv_getlength(bs->file);
@@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
      res->bfi.total_clusters =
          size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
- ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
+    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
+                              &nb_clusters);
      if (ret < 0) {
          goto fail;
      }
- compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
+    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
                        nb_clusters);
+ if (rebuild) {
+        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
+    }
Should this increate res->check_errors? (It doesn't survive until the
end of the series anyway, but more places seem to be added later that
print an error message without increasing the error count.)

Yes, it should. We could even bail out, but it doesn't make much sense here (it makes more sense in patch 8).

Thanks,

Max



reply via email to

[Prev in Thread] Current Thread [Next in Thread]