qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 6/7] block/block-copy: reduce intersecting request lock


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH v2 6/7] block/block-copy: reduce intersecting request lock
Date: Thu, 30 Jan 2020 20:09:28 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

30.01.2020 19:24, Andrey Shinkevich wrote:


On 30/01/2020 16:45, Vladimir Sementsov-Ogievskiy wrote:
29.01.2020 23:05, Andrey Shinkevich wrote:


On 27/11/2019 21:08, Vladimir Sementsov-Ogievskiy wrote:
Currently, block_copy operation lock the whole requested region. But
there is no reason to lock clusters, which are already copied, it will
disturb other parallel block_copy requests for no reason.

Let's instead do the following:

Lock only sub-region, which we are going to operate on. Then, after
copying all dirty sub-regions, we should wait for intersecting
requests block-copy, if they failed, we should retry these new dirty
clusters.

Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---
   block/block-copy.c | 116 +++++++++++++++++++++++++++++++++++++--------
   1 file changed, 95 insertions(+), 21 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 20068cd699..aca44b13fb 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -39,29 +39,62 @@ static BlockCopyInFlightReq 
*block_copy_find_inflight_req(BlockCopyState *s,
       return NULL;
   }
-static void coroutine_fn block_copy_wait_inflight_reqs(BlockCopyState *s,
-                                                       int64_t offset,
-                                                       int64_t bytes)
+/*
+ * If there are no intersecting requests return false. Otherwise, wait for the
+ * first found intersecting request to finish and return true.
+ */
+static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t start,
+                                             int64_t end)
   {
-    BlockCopyInFlightReq *req;
+    BlockCopyInFlightReq *req = block_copy_find_inflight_req(s, start, end);
-    while ((req = block_copy_find_inflight_req(s, offset, bytes))) {
-        qemu_co_queue_wait(&req->wait_queue, NULL);
+    if (!req) {
+        return false;
       }
+
+    qemu_co_queue_wait(&req->wait_queue, NULL);
+
+    return true;
   }
+/* Called only on full-dirty region */
   static void block_copy_inflight_req_begin(BlockCopyState *s,
                                             BlockCopyInFlightReq *req,
                                             int64_t offset, int64_t bytes)
   {
+    assert(!block_copy_find_inflight_req(s, offset, bytes));
+
+    bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
+
       req->offset = offset;
       req->bytes = bytes;
       qemu_co_queue_init(&req->wait_queue);
       QLIST_INSERT_HEAD(&s->inflight_reqs, req, list);
   }
-static void coroutine_fn block_copy_inflight_req_end(BlockCopyInFlightReq *req)
+static void coroutine_fn block_copy_inflight_req_shrink(BlockCopyState *s,
+        BlockCopyInFlightReq *req, int64_t new_bytes)
   {
+    if (new_bytes == req->bytes) {
+        return;
+    }
+
+    assert(new_bytes > 0 && new_bytes < req->bytes);
+
+    bdrv_set_dirty_bitmap(s->copy_bitmap,
+                          req->offset + new_bytes, req->bytes - new_bytes);
+
+    req->bytes = new_bytes;
+    qemu_co_queue_restart_all(&req->wait_queue);

Won't we get the performance degradation with that function frequent call?

Why do you think so? In IO most of performance depends on disk speed and how
we organize requests sequence. The whole original series shows performance 
improvement.

This patch reduces lock around request, locking only the part we are working on 
now,
this is for better interactivity. After calling block-status, the request is 
shrinked
to possibly unlock some other requests, waiting on the tail of our request.. Do 
you
have a better suggestion on this synchronization?


I cannot answer right away. One need to measure the performance in each case.

Measurements are done in the original (big) series
 "[RFC 00/24] backup performance: block_status + async"
 <address@hidden>
 https://lists.gnu.org/archive/html/qemu-devel/2019-11/msg02335.html

and this series (part I) is a first step to it.

Currently, improving interactivity of parallel block-copy requests is not very
meaningful, as all requests from backup are one-cluster-sized, and improving
performance of parallel intersecting guest writes is a strange thing.

But finally, backup is refactored to be the only one call of block_copy for the
whole disk, so this new locking of sub-requests becomes critical.

The only degradation I see, is that with this qemu_co_queue_restart_all, we may
restart requests, which still intersects with shrinked request, they wake up,
and go to the block_copy_wait_one again for nothing. This may be improved by
dropping coroutine-queue, and use own list of waiting requests, and wake-up
only requests which do not intersect with shrinked request. But I think, that it
is premature optimization which complicates the code. So, it may be done later
if needed. But I think that it doesn't worth it.


Andrey

+}
+
+static void coroutine_fn block_copy_inflight_req_end(BlockCopyState *s,
+ BlockCopyInFlightReq *req,
+                                                     int ret)
+{
+    if (ret < 0) {
+        bdrv_set_dirty_bitmap(s->copy_bitmap, req->offset, req->bytes);
+    }
       QLIST_REMOVE(req, list);
       qemu_co_queue_restart_all(&req->wait_queue);
   }
@@ -344,12 +377,19 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
       return ret;
   }
-int coroutine_fn block_copy(BlockCopyState *s,
-                            int64_t offset, uint64_t bytes,
-                            bool *error_is_read)
+/*
+ * block_copy_dirty_clusters
+ *
+ * Copy dirty clusters in @start/@bytes range.

%s/start/offset/ ?

+ * Returns 1 if dirty clusters found and successfully copied, 0 if no dirty
+ * clusters found and -errno on failure.
+ */
+static int coroutine_fn block_copy_dirty_clusters(BlockCopyState *s,
+                                                  int64_t offset, int64_t 
bytes,
+                                                  bool *error_is_read)
   {
       int ret = 0;
-    BlockCopyInFlightReq req;
+    bool found_dirty = false;
       /*
        * block_copy() user is responsible for keeping source and target in same
@@ -361,10 +401,8 @@ int coroutine_fn block_copy(BlockCopyState *s,
       assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
       assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
-    block_copy_wait_inflight_reqs(s, offset, bytes);
-    block_copy_inflight_req_begin(s, &req, offset, bytes);
-
       while (bytes) {
+        BlockCopyInFlightReq req;
           int64_t next_zero, cur_bytes, status_bytes;
           if (!bdrv_dirty_bitmap_get(s->copy_bitmap, offset)) {
@@ -374,6 +412,8 @@ int coroutine_fn block_copy(BlockCopyState *s,
               continue; /* already copied */
           }
+        found_dirty = true;
+
           cur_bytes = MIN(bytes, s->copy_size);
           next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, offset,
@@ -383,10 +423,12 @@ int coroutine_fn block_copy(BlockCopyState *s,
               assert(next_zero < offset + cur_bytes); /* no need to do MIN() */
               cur_bytes = next_zero - offset;
           }
+        block_copy_inflight_req_begin(s, &req, offset, cur_bytes);
           ret = block_copy_block_status(s, offset, cur_bytes, &status_bytes);
+        block_copy_inflight_req_shrink(s, &req, status_bytes);
           if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
-            bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, status_bytes);
+            block_copy_inflight_req_end(s, &req, 0);
               s->progress_reset_callback(s->progress_opaque);
               trace_block_copy_skip_range(s, offset, status_bytes);
               offset += status_bytes;
@@ -398,15 +440,13 @@ int coroutine_fn block_copy(BlockCopyState *s,
           trace_block_copy_process(s, offset);
-        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
-
           co_get_from_shres(s->mem, cur_bytes);
           ret = block_copy_do_copy(s, offset, cur_bytes, ret & BDRV_BLOCK_ZERO,
                                    error_is_read);
           co_put_to_shres(s->mem, cur_bytes);
+        block_copy_inflight_req_end(s, &req, ret);
           if (ret < 0) {
-            bdrv_set_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
-            break;
+            return ret;
           }
           s->progress_bytes_callback(cur_bytes, s->progress_opaque);
@@ -414,7 +454,41 @@ int coroutine_fn block_copy(BlockCopyState *s,
           bytes -= cur_bytes;
       }
-    block_copy_inflight_req_end(&req);
+    return found_dirty;
+}
-    return ret;
+int coroutine_fn block_copy(BlockCopyState *s, int64_t start, uint64_t bytes,
+                            bool *error_is_read)
+{
+    while (true) {
+        int ret = block_copy_dirty_clusters(s, start, bytes, error_is_read);
+
+        if (ret < 0) {
+            /*
+             * IO operation failed, which means the whole block_copy request
+             * failed.
+             */
+            return ret;
+        }
+        if (ret) {
+            /*
+             * Something was copied, which means that there were yield points
+             * and some new dirty bits may appered (due to failed parallel
+             * block-copy requests).
+             */
+            continue;
+        }
+
+        /*
+         * Here ret == 0, which means that there is no dirty clusters in

there is no dirty cluster in

+         * requested region.
+         */
+
+        if (!block_copy_wait_one(s, start, bytes)) {
+            /* No dirty bits and nothing to wait: the whole request is done */
+            break;
+        }
+    }
+
+    return 0;
   }







--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]