qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->childr


From: Emanuele Giuseppe Esposito
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Thu, 31 Mar 2022 15:51:00 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


Am 31/03/2022 um 11:59 schrieb Paolo Bonzini:
> On 3/30/22 18:02, Paolo Bonzini wrote:
>> On 3/30/22 12:53, Hanna Reitz wrote:
>>>>
>>>> Seems a good compromise between drains and rwlock. What do you think?
>>>
>>> Well, sounds complicated.  So I’m asking myself whether this would be
>>> noticeably better than just an RwLock for graph modifications, like
>>> the global lock Vladimir has proposed.
> 
> [try again, this time with brain connected]
> 
> A global lock would have to be taken by all iothreads on every I/O
> operation, even a CoRwLock would not scale because it has a global
> CoMutex inside and rdlock/unlock both take it.  Even if the critical
> section is small, the cacheline bumping would be pretty bad.
> 
> Perhaps we could reuse the code in cpus-common.c, which relies on a list
> of possible readers and is quite cheap (two memory barriers basically)
> for readers.  Here we would have a list of AioContexts (or perhaps
> BlockDriverStates?) as the possible readers.
> 
> The slow path uses a QemuMutex, a QemuCond for the readers and a
> QemuCond for the writers.  The reader QemuCond can be replaced by a
> CoQueue, I think.
> 

It would be something like this:
Essentially move graph_bdrv_states as public, so that we can iterate
through all bs as cpu-common.c does with cpus, and then duplicate the
already existing API in cpu-common.c:

bdrv_graph_list_wrlock <-> start_exclusive
bdrv_graph_list_wrunlock <-> end_exclusive
bdrv_graph_list_rdlock <-> cpu_exec_start
bdrv_graph_list_rdunlock <-> cpu_exec_end

The only difference is that there is no qemu_cpu_kick(), but I don't
think it matters.

What do you think?

diff --git a/block.c b/block.c
index 718e4cae8b..af8dd37101 100644
--- a/block.c
+++ b/block.c
@@ -52,6 +52,7 @@
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "block/coroutines.h"
+#include "block/block-list.h"

 #ifdef CONFIG_BSD
 #include <sys/ioctl.h>
@@ -67,10 +68,6 @@

 #define NOT_DONE 0x7fffffff /* used while emulated sync operation in
progress */

-/* Protected by BQL */
-static QTAILQ_HEAD(, BlockDriverState) graph_bdrv_states =
-    QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
-
 /* Protected by BQL */
 static QTAILQ_HEAD(, BlockDriverState) all_bdrv_states =
     QTAILQ_HEAD_INITIALIZER(all_bdrv_states);
diff --git a/include/block/block-list.h b/include/block/block-list.h
new file mode 100644
index 0000000000..d55a056938
--- /dev/null
+++ b/include/block/block-list.h
@@ -0,0 +1,54 @@
+#ifndef BLOCK_LIST_H
+#define BLOCK_LIST_H
+
+#include "qemu/osdep.h"
+
+
+/* Protected by BQL */
+QTAILQ_HEAD(,BlockDriverState) graph_bdrv_states =
QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
+
+void bdrv_init_graph_list_lock(void);
+
+/*
+ * bdrv_graph_list_wrlock:
+ * Modify the graph. Nobody else is allowed to access the graph.
+ * Set pending op >= 1, so that the next readers will wait in a
coroutine queue.
+ * Then keep track of the running readers using bs->has_writer,
+ * and wait until they finish.
+ */
+void bdrv_graph_list_wrlock(void);
+
+/*
+ * bdrv_graph_list_wrunlock:
+ * Write finished, reset pending operations to 0 and restart
+ * all readers that are waiting.
+ */
+void bdrv_graph_list_wrunlock(void);
+
+/*
+ * bdrv_graph_list_rdlock:
+ * Read the bs graph. Set bs->reading_graph true, and if there are pending
+ * operations, it means that the main loop is modifying the graph,
+ * therefore wait in exclusive_resume coroutine queue.
+ * If this read is coming while the writer has already marked the
+ * running read requests and it's waiting for them to finish,
+ * wait that the write finishes first.
+ * Main loop will then wake this coroutine once it is done.
+ */
+void bdrv_graph_list_rdlock(BlockDriverState *bs);
+
+/*
+ * bdrv_graph_list_rdunlock:
+ * Read terminated, decrease the count of pending operations.
+ * If it's the last read that the writer is waiting for, signal
+ * the writer that we are done and let it continue.
+ */
+void bdrv_graph_list_rdunlock(BlockDriverState *bs);
+
+
+
+
+#define BS_GRAPH_READER(bs) /* in main loop OR bs->reading_graph */
+#define BS_GRAPH_WRITER(bs) /* in main loop AND bs->bs_graph_pending_op
> 0 */
+#endif /* BLOCK_LIST_H */
+
diff --git a/include/block/block_int-common.h
b/include/block/block_int-common.h
index 5a04c778e4..0266876a59 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -985,6 +985,20 @@ struct BlockDriverState {
     bool force_share; /* if true, always allow all shared permissions */
     bool implicit;  /* if true, this filter node was automatically
inserted */

+    /*
+     * If true, the bs is reading the graph.
+     * No write should happen in the meanwhile.
+     * Atomic.
+     */
+    bool reading_graph;
+
+    /*
+     * If true, the main loop is modifying the graph.
+     * bs cannot read the graph.
+     * Protected by bs_graph_list_lock.
+     */
+    bool has_writer;
+
     BlockDriver *drv; /* NULL means no media */
     void *opaque;




reply via email to

[Prev in Thread] Current Thread [Next in Thread]