Re: [PATCH v2 17/20] backup: move to block-copy

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 17/20] backup: move to block-copy

From:	Vladimir Sementsov-Ogievskiy
Subject:	Re: [PATCH v2 17/20] backup: move to block-copy
Date:	Mon, 21 Sep 2020 16:58:06 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.2.2

23.07.2020 12:47, Max Reitz wrote:

On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:

This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
  include/block/block-copy.h |   9 +--
  block/backup.c             | 145 +++++++++++++++++++------------------
  block/block-copy.c         |  21 ++----
  3 files changed, 83 insertions(+), 92 deletions(-)


This patch feels like it should be multiple ones.  I don’t see why a
patch that lets backup use block-copy (more) should have to modify the
block-copy code.

More specifically: I think that block_copy_set_progress_callback()
should be removed in a separate patch.  Also, moving
@cb_opaque/@progress_opaque from BlockCopyState into BlockCopyCallState
seems like a separate patch to me, too.

(I presume if the cb had had its own opaque object from patch 5 on,
there wouldn’t be this problem now.  We’d just stop using the progress
callback in backup, and remove it in one separate patch.)

[...]

diff --git a/block/backup.c b/block/backup.c
index ec2676abc2..59c00f5293 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
      BlockdevOnError on_source_error;
      BlockdevOnError on_target_error;
      uint64_t len;
-    uint64_t bytes_read;
      int64_t cluster_size;
      int max_workers;
      int64_t max_chunk;

BlockCopyState *bcs;

+
+    BlockCopyCallState *bcs_call;


Can this be more descriptive?  E.g. background_bcs?  bg_bcs_call?  bg_bcsc?

+    int ret;
+    bool error_is_read;
  } BackupBlockJob;
static const BlockJobDriver backup_job_driver;


[...]

  static int coroutine_fn backup_loop(BackupBlockJob *job)
  {
-    bool error_is_read;
-    int64_t offset;
-    BdrvDirtyBitmapIter *bdbi;
-    int ret = 0;
+    while (true) { /* retry loop */
+        assert(!job->bcs_call);
+        job->bcs_call = block_copy_async(job->bcs, 0,
+                                         QEMU_ALIGN_UP(job->len,
+                                                       job->cluster_size),
+                                         true, job->max_workers, 
job->max_chunk,
+                                         backup_block_copy_callback, job);

- bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));

-    while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-        do {
-            if (yield_and_check(job)) {
-                goto out;
+        while (job->bcs_call && !job->common.job.cancelled) {
+            /* wait and handle pauses */


Doesn’t someone need to reset BlockCopyCallState.cancelled when the job
is unpaused?  I can’t see anyone doing that.

Well, or even just reentering the block-copy operation after
backup_pause() has cancelled it.  Is there some magic I’m missing?

Does pausing (which leads to cancelling the block-copy operation) just
yield to the CB being invoked, so the copy operation is considered done,
and then we just enter the next iteration of the loop and try again?


Yes, that's my idea: we cancel on pause and just run new block_copy operation
on resume.

But cancelling the block-copy operation leads to it returning 0, AFAIR,
so...


Looks like it should be fixed. By returning ECANCELED or by checking the bitmap.

+
+            job_pause_point(&job->common.job);
+
+            if (job->bcs_call && !job->common.job.cancelled) {
+                job_yield(&job->common.job);
              }
-            ret = backup_do_cow(job, offset, job->cluster_size, 
&error_is_read);
-            if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
-                           BLOCK_ERROR_ACTION_REPORT)
-            {
-                goto out;
+        }
+
+        if (!job->bcs_call && job->ret == 0) {
+            /* Success */
+            return 0;


...I would assume we return here when the job is paused.

+        }
+
+        if (job->common.job.cancelled) {
+            if (job->bcs_call) {
+                block_copy_cancel(job->bcs_call);
              }
-        } while (ret < 0);
+            return 0;
+        }
+
+        if (!job->bcs_call && job->ret < 0 &&


Is it even possible for bcs_call to be non-NULL here?

+            (backup_error_action(job, job->error_is_read, -job->ret) ==
+             BLOCK_ERROR_ACTION_REPORT))
+        {
+            return job->ret;
+        }


So if we get an error, and the error action isn’t REPORT, we just try
the whole operation again?  That doesn’t sound very IGNORE-y to me.


Not the whole. We have the dirty bitmap: clusters that was already copied are 
not touched more.



--
Best regards,
Vladimir

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v2 17/20] backup: move to block-copy, Vladimir Sementsov-Ogievskiy <=

Prev by Date: Re: Limiting per vCPU Usage
Next by Date: [Bug 1896263] Re: The bios-tables-test test causes QEMU to crash (Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed) on AMD processors
Previous by thread: Regression with latest SeaBIOS booting multi-disk root LVs?
Next by thread: [PATCH REPOST] hw: virtio-pmem: detach the element fromt the virtqueue when error occurs
Index(es):
- Date
- Thread