Re: [PATCH v2 17/20] backup: move to block-copy

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 17/20] backup: move to block-copy

From:	Max Reitz
Subject:	Re: [PATCH v2 17/20] backup: move to block-copy
Date:	Thu, 23 Jul 2020 11:47:41 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:
> This brings async request handling and block-status driven chunk sizes
> to backup out of the box, which improves backup performance.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/block-copy.h |   9 +--
>  block/backup.c             | 145 +++++++++++++++++++------------------
>  block/block-copy.c         |  21 ++----
>  3 files changed, 83 insertions(+), 92 deletions(-)

This patch feels like it should be multiple ones.  I don’t see why a
patch that lets backup use block-copy (more) should have to modify the
block-copy code.

More specifically: I think that block_copy_set_progress_callback()
should be removed in a separate patch.  Also, moving
@cb_opaque/@progress_opaque from BlockCopyState into BlockCopyCallState
seems like a separate patch to me, too.

(I presume if the cb had had its own opaque object from patch 5 on,
there wouldn’t be this problem now.  We’d just stop using the progress
callback in backup, and remove it in one separate patch.)

[...]

> diff --git a/block/backup.c b/block/backup.c
> index ec2676abc2..59c00f5293 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
>      BlockdevOnError on_source_error;
>      BlockdevOnError on_target_error;
>      uint64_t len;
> -    uint64_t bytes_read;
>      int64_t cluster_size;
>      int max_workers;
>      int64_t max_chunk;
>  
>      BlockCopyState *bcs;
> +
> +    BlockCopyCallState *bcs_call;

Can this be more descriptive?  E.g. background_bcs?  bg_bcs_call?  bg_bcsc?

> +    int ret;
> +    bool error_is_read;
>  } BackupBlockJob;
>  
>  static const BlockJobDriver backup_job_driver;
>  

[...]

>  static int coroutine_fn backup_loop(BackupBlockJob *job)
>  {
> -    bool error_is_read;
> -    int64_t offset;
> -    BdrvDirtyBitmapIter *bdbi;
> -    int ret = 0;
> +    while (true) { /* retry loop */
> +        assert(!job->bcs_call);
> +        job->bcs_call = block_copy_async(job->bcs, 0,
> +                                         QEMU_ALIGN_UP(job->len,
> +                                                       job->cluster_size),
> +                                         true, job->max_workers, 
> job->max_chunk,
> +                                         backup_block_copy_callback, job);
>  
> -    bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));
> -    while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
> -        do {
> -            if (yield_and_check(job)) {
> -                goto out;
> +        while (job->bcs_call && !job->common.job.cancelled) {
> +            /* wait and handle pauses */

Doesn’t someone need to reset BlockCopyCallState.cancelled when the job
is unpaused?  I can’t see anyone doing that.

Well, or even just reentering the block-copy operation after
backup_pause() has cancelled it.  Is there some magic I’m missing?

Does pausing (which leads to cancelling the block-copy operation) just
yield to the CB being invoked, so the copy operation is considered done,
and then we just enter the next iteration of the loop and try again?
But cancelling the block-copy operation leads to it returning 0, AFAIR,
so...

> +
> +            job_pause_point(&job->common.job);
> +
> +            if (job->bcs_call && !job->common.job.cancelled) {
> +                job_yield(&job->common.job);
>              }
> -            ret = backup_do_cow(job, offset, job->cluster_size, 
> &error_is_read);
> -            if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
> -                           BLOCK_ERROR_ACTION_REPORT)
> -            {
> -                goto out;
> +        }
> +
> +        if (!job->bcs_call && job->ret == 0) {
> +            /* Success */
> +            return 0;

...I would assume we return here when the job is paused.

> +        }
> +
> +        if (job->common.job.cancelled) {
> +            if (job->bcs_call) {
> +                block_copy_cancel(job->bcs_call);
>              }
> -        } while (ret < 0);
> +            return 0;
> +        }
> +
> +        if (!job->bcs_call && job->ret < 0 &&

Is it even possible for bcs_call to be non-NULL here?

> +            (backup_error_action(job, job->error_is_read, -job->ret) ==
> +             BLOCK_ERROR_ACTION_REPORT))
> +        {
> +            return job->ret;
> +        }

So if we get an error, and the error action isn’t REPORT, we just try
the whole operation again?  That doesn’t sound very IGNORE-y to me.

>      }
>  
> - out:
> -    bdrv_dirty_iter_free(bdbi);
> -    return ret;
> +    g_assert_not_reached();
>  }
>  
>  static void backup_init_bcs_bitmap(BackupBlockJob *job)
> @@ -246,9 +227,14 @@ static int coroutine_fn backup_run(Job *job, Error 
> **errp)
>          int64_t count;
>  
>          for (offset = 0; offset < s->len; ) {
> -            if (yield_and_check(s)) {
> -                ret = -ECANCELED;
> -                goto out;
> +            if (job_is_cancelled(job)) {
> +                return -ECANCELED;

I’d either drop the out label altogether, or use it here.  It’s a bit
weird to use it sometimes, but not all the time.

> +            }
> +
> +            job_pause_point(job);
> +
> +            if (job_is_cancelled(job)) {
> +                return -ECANCELED;
>              }
>  
>              ret = block_copy_reset_unallocated(s->bcs, offset, &count);
> @@ -281,6 +267,25 @@ static int coroutine_fn backup_run(Job *job, Error 
> **errp)
>      return ret;
>  }
>  
> +static void coroutine_fn backup_pause(Job *job)
> +{
> +    BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> +
> +    if (s->bcs_call) {
> +        block_copy_cancel(s->bcs_call);
> +    }
> +}
> +
> +static void coroutine_fn backup_set_speed(BlockJob *job, int64_t speed)
> +{
> +    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> +
> +    if (s->bcs) {
> +        /* In block_job_create we yet don't have bcs */

Shouldn’t hurt to make it conditional, but how can we get here in
block_job_create()?

> +        block_copy_set_speed(s->bcs, s->bcs_call, speed);
> +    }
> +}
> +
>  static const BlockJobDriver backup_job_driver = {
>      .job_driver = {
>          .instance_size          = sizeof(BackupBlockJob),

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v2 17/20] backup: move to block-copy, Max Reitz <=

Prev by Date: Re: [PATCH v2 16/20] iotests: 257: prepare for backup over block-copy
Next by Date: Re: [PATCH for-5.1] nbd: Fix large trim/zero requests
Previous by thread: Re: [PATCH v2 16/20] iotests: 257: prepare for backup over block-copy
Next by thread: Re: [PATCH v2 18/20] block/block-copy: drop unused argument of block_copy()
Index(es):
- Date
- Thread