On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:
This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
include/block/block-copy.h | 9 +--
block/backup.c | 145 +++++++++++++++++++------------------
block/block-copy.c | 21 ++----
3 files changed, 83 insertions(+), 92 deletions(-)
This patch feels like it should be multiple ones. I don’t see why a
patch that lets backup use block-copy (more) should have to modify the
block-copy code.
More specifically: I think that block_copy_set_progress_callback()
should be removed in a separate patch. Also, moving
@cb_opaque/@progress_opaque from BlockCopyState into BlockCopyCallState
seems like a separate patch to me, too.
(I presume if the cb had had its own opaque object from patch 5 on,
there wouldn’t be this problem now. We’d just stop using the progress
callback in backup, and remove it in one separate patch.)
[...]
diff --git a/block/backup.c b/block/backup.c
index ec2676abc2..59c00f5293 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
BlockdevOnError on_source_error;
BlockdevOnError on_target_error;
uint64_t len;
- uint64_t bytes_read;
int64_t cluster_size;
int max_workers;
int64_t max_chunk;
BlockCopyState *bcs;
+
+ BlockCopyCallState *bcs_call;
Can this be more descriptive? E.g. background_bcs? bg_bcs_call? bg_bcsc?
+ int ret;
+ bool error_is_read;
} BackupBlockJob;
static const BlockJobDriver backup_job_driver;
[...]
static int coroutine_fn backup_loop(BackupBlockJob *job)
{
- bool error_is_read;
- int64_t offset;
- BdrvDirtyBitmapIter *bdbi;
- int ret = 0;
+ while (true) { /* retry loop */
+ assert(!job->bcs_call);
+ job->bcs_call = block_copy_async(job->bcs, 0,
+ QEMU_ALIGN_UP(job->len,
+ job->cluster_size),
+ true, job->max_workers,
job->max_chunk,
+ backup_block_copy_callback, job);
- bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));
- while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
- do {
- if (yield_and_check(job)) {
- goto out;
+ while (job->bcs_call && !job->common.job.cancelled) {
+ /* wait and handle pauses */
Doesn’t someone need to reset BlockCopyCallState.cancelled when the job
is unpaused? I can’t see anyone doing that.
Well, or even just reentering the block-copy operation after
backup_pause() has cancelled it. Is there some magic I’m missing?
Does pausing (which leads to cancelling the block-copy operation) just
yield to the CB being invoked, so the copy operation is considered done,
and then we just enter the next iteration of the loop and try again?
+
+ job_pause_point(&job->common.job);
+
+ if (job->bcs_call && !job->common.job.cancelled) {
+ job_yield(&job->common.job);
}
- ret = backup_do_cow(job, offset, job->cluster_size,
&error_is_read);
- if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
- BLOCK_ERROR_ACTION_REPORT)
- {
- goto out;
+ }
+
+ if (!job->bcs_call && job->ret == 0) {
+ /* Success */
+ return 0;
...I would assume we return here when the job is paused.
+ }
+
+ if (job->common.job.cancelled) {
+ if (job->bcs_call) {
+ block_copy_cancel(job->bcs_call);
}
- } while (ret < 0);
+ return 0;
+ }
+
+ if (!job->bcs_call && job->ret < 0 &&
Is it even possible for bcs_call to be non-NULL here?
+ (backup_error_action(job, job->error_is_read, -job->ret) ==
+ BLOCK_ERROR_ACTION_REPORT))
+ {
+ return job->ret;
+ }
So if we get an error, and the error action isn’t REPORT, we just try
the whole operation again? That doesn’t sound very IGNORE-y to me.