qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/6] block: block-status cache for data regions


From: Max Reitz
Subject: Re: [PATCH 2/6] block: block-status cache for data regions
Date: Mon, 21 Jun 2021 11:37:22 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1

On 18.06.21 20:51, Eric Blake wrote:
On Thu, Jun 17, 2021 at 05:52:43PM +0200, Max Reitz wrote:
To address this, we want to cache data regions.  Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.
Here's hoping third time's the charm!

Indeed :)

(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine.  While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)

We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
  include/block/block_int.h | 19 ++++++++++
  block.c                   |  2 +
  block/io.c                | 80 +++++++++++++++++++++++++++++++++++++--
  3 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index a8f9598102..c09512556a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -832,6 +832,23 @@ struct BdrvChild {
      QLIST_ENTRY(BdrvChild) next_parent;
  };
+/*
+ * Allows bdrv_co_block_status() to cache one data region for a
+ * protocol node.
+ *
+ * @lock: Lock for accessing this object's fields
+ * @valid: Whether the cache is valid
+ * @data_start: Offset where we know (or strongly assume) is data
+ * @data_end: Offset where the data region ends (which is not necessarily
+ *            the start of a zeroed region)
+ */
+typedef struct BdrvBlockStatusCache {
+    CoMutex lock;
+    bool valid;
+    int64_t data_start;
+    int64_t data_end;
+} BdrvBlockStatusCache;
Looks like the right bits of information, and I'm glad you documented
the need to be prepared for protocols that report split data sections
rather than consolidated.

+++ b/block/io.c
@@ -35,6 +35,7 @@
  #include "qapi/error.h"
  #include "qemu/error-report.h"
  #include "qemu/main-loop.h"
+#include "qemu/range.h"
  #include "sysemu/replay.h"
/* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
@@ -1862,6 +1863,7 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
      bool need_flush = false;
      int head = 0;
      int tail = 0;
+    BdrvBlockStatusCache *bsc = &bs->block_status_cache;
int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
      int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
@@ -1878,6 +1880,16 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
          return -ENOTSUP;
      }
+ /* Invalidate the cached block-status data range if this write overlaps */
+    qemu_co_mutex_lock(&bsc->lock);
Are we going to be suffering from a lot of lock contention performance
degradation?  Is there a way to take advantage of RCU access patterns
for any more performance without sacrificing correctness?

The critical section is so short that I considered it fine.  I wanted to use RW locks, but then realized that every RW lock operation is internally locked by another mutex, so it wouldn’t gain anything.

I’m not sure whether RCU is worth it here.

We could try something very crude, namely to just not take a lock and make `valid` an atomic.  After all, it doesn’t really matter whether `data_start` and `data_end` are consistent values, and resetting `valid` to false is always safe.

The worst that could happen is that a concurrent block-status call tries to set up an overlapping data area, which we thus fail to recognize here.  But if such a thing were to happen, it could just as well happen before said concurrent call took any lock on `bsc`.

+    if (bsc->valid &&
+        ranges_overlap(offset, bytes, bsc->data_start,
+                       bsc->data_end - bsc->data_start))
+    {
+        bsc->valid = false;
+    }
Do we want to invalidate the entire bsc, or can we be smart and leave
the prefix intact (if offset > bsc->data_start, then set bsc->data_end
to offset)?

Perhaps we could be smart, but I don’t think it really makes a difference in practice, so I think keeping it simple is better.

+    qemu_co_mutex_unlock(&bsc->lock);
Worth using WITH_QEMU_LOCK_GUARD?

I knew I forgot something, right.  Will use!

Max




reply via email to

[Prev in Thread] Current Thread [Next in Thread]