qemu-devel
[Top][All Lists]

recursive locks (in general)

 From: Christian Schoenebeck Subject: recursive locks (in general) Date: Fri, 21 Aug 2020 13:12:50 +0200

```On Donnerstag, 20. August 2020 12:54:49 CEST Paolo Bonzini wrote:
> More seriously: programming with concurrency is first and foremost about
> understanding invariants[1].  Locks are relatively simple to reason
> about because they enforce invariants at the points where they are
> acquired or released.

As a side note, independent of the actual QBL issue discussed, while I agree
with you that nested locks should be avoided as much as possible, especially
on heterogenous large scale projects like QEMU; please let me correct some of
the things you said about recursive locks in general:

> However, recursive locks are pure evil. :)

That's a common overgeneralization of the potential issues when dealing with
recursive locks. Especially this claim ...

> but callers have no clue about what invariants are provided around calls
> to do_it().  So, understanding complex code that uses recursive mutexes
> is effectively impossible.

... is incorrect.

It is correct that you can run into deadlocks if you don't know how to deal
with nested recursive locks appropriately. It is incorrect though to say they
were not managable.

There is a golden rule with recursive locks: You always have to preserve a
clear hierarchy. Say you have the following recursive mutexes:

RecursiveMutex mutex0;
RecursiveMutex mutex1;
RecursiveMutex mutex2;
...
RecursiveMutex mutexN;

where the suffix shall identify the hierarchy, i.e. h(mutex0) = 0,
h(mutex1) = 1, ... h(mutexN) = N. Then the golden rule is that in any call
stack the nested locks must always preserve the same transitive hierarchy,
e.g.:

h(lock1) <= h(lock2) <= ... <= h(lockK)

Example (using lock-guard notation), let's say ascending transitivity is
chosen, then the following is Ok, as it does not violate chosen transitivity:

synchronized(mutex0) {
synchronized(mutex1) {
...
synchronized(mutexN) {
}
}
}

Likewise, the following is Ok as well:

synchronized(mutex3) {
synchronized(mutex8) {
...
synchronized(mutexN) {
}
}
}

whereas this would be illegal:

synchronized(mutex3) {
synchronized(mutex2) { // < violates chosen transitivity
...
synchronized(mutexN) {
}
}
}

Now let's say one day somebody accidentally broke that rule and ran into a
deadlock. What you can do to resolve the situation is capturing the call stack
of each mutex's [last] lock. And when you triggered the deadlock, you know
that at least one of the threads violated the lock hierarchy. So you look at
the individual call stacks and correct the program flow appropriately to
restore the hierarchy. And the latter is BTW independent of (i.e. any side
effects) of other threads, so it is really just about looking at what exactly

And for the latter reason, there are also more systematic approaches to ensure
correctness: for instance a built-in hierarchy check in the individual Mutex
implementation which would then e.g. raise an assertaion fault on early
testing whenever a developer broke the hierarchy in a nested lock sequence.

Another solution would be integrating this hierarchy check into a (e.g.
static) sanatizer, as this information can already be derived directly from
the AST in most cases. But unfortunatley this does not exist yet in any
sanatizer yet AFAIK.

For me, a non-recursive mutex makes sense for one use case: if the intention
is to lock the mutex on one thread while allowing to unlock it on another
thread. For all other use cases I would (personally) prefer a recursive type,
as it guards a clear ownership relation and hence allows to guard and prevent
many mistakes.

Best regards,
Christian Schoenebeck

```