[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 15/36] block: use topological sort for permission update

From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH v2 15/36] block: use topological sort for permission update
Date: Thu, 28 Jan 2021 12:34:46 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1

27.01.2021 21:38, Kevin Wolf wrote:
Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.

Consider the following example:

     A -+
     |  |
     |  v
     |  B
     |  |
     v  |

A is parent for B and C, B is parent for C.

Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated.

I wondered for a moment why this order is obvious. Taking a permission
on A may mean that we need to take the permisson on C, too.

The answer is (or so I think) that the whole operation is atomic so the
half-updated state will never be visible to a caller, but this is about
calculating the right permissions. Permissions a node needs on its
children may depend on what its parents requested, but parent
permissions never depend on what children request.

yes, that's about these relations

But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).

Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).

Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".

We also need to support ignore_children in

For test 283 order of parents compliance check is changed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  block.c                     | 103 +++++++++++++++++++++++++++++-------
  tests/test-bdrv-graph-mod.c |   4 +-
  tests/qemu-iotests/283.out  |   2 +-
  3 files changed, 86 insertions(+), 23 deletions(-)

diff --git a/block.c b/block.c
index 92bfcbedc9..81ccf51605 100644
--- a/block.c
+++ b/block.c
@@ -1994,7 +1994,9 @@ static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, 
Error **errp)
      return false;
-static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
+static bool bdrv_check_parents_compliance(BlockDriverState *bs,
+                                          GSList *ignore_children,
+                                          Error **errp)
      BdrvChild *a, *b;
@@ -2005,7 +2007,9 @@ static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
      QLIST_FOREACH(a, &bs->parents, next_parent) {
          QLIST_FOREACH(b, &bs->parents, next_parent) {
-            if (a == b) {
+            if (a == b || g_slist_find(ignore_children, a) ||
+                g_slist_find(ignore_children, b))

'a' should be checked in the outer loop, no reason to repeat the same
check all the time in the inner loop.

+            {
@@ -2034,6 +2038,29 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
+static GSList *bdrv_topological_dfs(GSList *list, GHashTable *found,
+                                    BlockDriverState *bs)

It would be good to have a comment that explains the details of the

In particular, this seems to require that @list is already topologically
sorted, and it's complete in the sense that if a node is in the list,
all of its children are in the list, too.

Right, will add

+    BdrvChild *child;
+    g_autoptr(GHashTable) local_found = NULL;
+    if (!found) {
+        assert(!list);
+        found = local_found = g_hash_table_new(NULL, NULL);
+    }
+    if (g_hash_table_contains(found, bs)) {
+        return list;
+    }
+    g_hash_table_add(found, bs);
+    QLIST_FOREACH(child, &bs->children, next) {
+        list = bdrv_topological_dfs(list, found, child->bs);
+    }
+    return g_slist_prepend(list, bs);
  static void bdrv_child_set_perm_commit(void *opaque)
      BdrvChild *c = opaque;
@@ -2098,10 +2125,10 @@ static void bdrv_child_set_perm_safe(BdrvChild *c, 
uint64_t perm,
   * A call to this function must always be followed by a call to 
   * or bdrv_abort_perm_update().

One big source of confusion for me when trying to understand this was
that bdrv_check_perm() is a misnomer since commit f962e96150e and the
above comment isn't really accurate any more.

The function doesn't only check the validity of the new permissions in
advance to actually making the change, but it already updates the
permissions of all child nodes (however not of its root node).

So we have gone from the original check/set/abort model (which the
function names still suggest) to a prepare/commit/rollback model.

I think some comment updates are in order, and possibly we should rename
some functions, too.

In the end of the series they are refactored and renamed to be native part of 
new transaction system (introduced in [10])

-static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
-                           uint64_t cumulative_perms,
-                           uint64_t cumulative_shared_perms,
-                           GSList *ignore_children, Error **errp)
+static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
+                                uint64_t cumulative_perms,
+                                uint64_t cumulative_shared_perms,
+                                GSList *ignore_children, Error **errp)
      BlockDriver *drv = bs->drv;
      BdrvChild *c;
@@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
      /* Check all children */
      QLIST_FOREACH(c, &bs->children, next) {
          uint64_t cur_perm, cur_shared;
-        GSList *cur_ignore_children;
bdrv_child_perm(bs, c->bs, c, c->role, q,
                          cumulative_perms, cumulative_shared_perms,
                          &cur_perm, &cur_shared);
+        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);

This "added" line is actually old code. What is removed here is the
recursive call of bdrv_check_update_perm(). This is what the code below
will have to replace.

yes, we'll use explicit loop instead of recursion

+    }
+    return 0;
+static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
+                           uint64_t cumulative_perms,
+                           uint64_t cumulative_shared_perms,
+                           GSList *ignore_children, Error **errp)
+    int ret;
+    BlockDriverState *root = bs;
+    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
+    for ( ; list; list = list->next) {
+        bs = list->data;
+        if (bs != root) {
+            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
+                return -EINVAL;
+            }

At this point bs still had the old permissions, but we don't access
them. As we're going in topological order, the parents have already been
updated if they were a child covered in bdrv_node_check_perm(), so we're
checking the relevant values. Good.

What about the root node? If I understand correctly, the parents of the
root nodes wouldn't have been checked in the old code. In the new state,
the parent BdrvChild already has to contain the new permission.

In bdrv_refresh_perms(), we already check parent conflicts, so no change
for all callers going through it. Good.

bdrv_reopen_multiple() is less obvious. It passes permissions from the
BDRVReopenState, without applying the permissions first.

It will be changed in the series

Do we check the
old parent permissions instead of the new state here?

We use given (new) cumulative permissions for bs, and recalculate permissions 
for bs subtree.

It follows old behavior. The only thing is changed that pre-patch we do DFS 
recursion starting from bs (and probably visit some nodes several times), 
after-patch we first do topological sort of bs subtree and go through the list. 
The order of nodes is better and we visit each node once.

+            bdrv_get_cumulative_perm(bs, &cumulative_perms,
+                                     &cumulative_shared_perms);
+        }
- cur_ignore_children = g_slist_prepend(g_slist_copy(ignore_children), c);
-        ret = bdrv_check_update_perm(c->bs, q, cur_perm, cur_shared,
-                                     cur_ignore_children, errp);
-        g_slist_free(cur_ignore_children);
+        ret = bdrv_node_check_perm(bs, q, cumulative_perms,
+                                   cumulative_shared_perms,
+                                   ignore_children, errp);

We use the original ignore_children for every node in the sorted list.
The old code extends it with all nodes in the path to each node.

For the bdrv_check_update_perm() call that is now replaced with
bdrv_check_parents_compliance(), I think this was necessary because
bdrv_check_update_perm() always assumes adding a new edge, so if you
update one instead of adding it, you have to ignore it so that it can't
conflict with itself. This isn't necessary any more now because we just
update and then check for consistency.

For passing to bdrv_node_check_perm() it doesn't make a difference
anyway because the parameter is now unused (and should probably be

ignore_children will be dropped in [27]. For now it is still needed for 

          if (ret < 0) {
              return ret;
-        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
return 0;

A tricky patch to understand, but I think it's right for the most part.


Best regards,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]