Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic

From:	John Snow
Subject:	Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel
Date:	Mon, 5 Feb 2018 14:28:55 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2


On 01/31/2018 09:19 PM, Liang Li wrote:
> On Tue, Jan 30, 2018 at 03:18:31PM -0500, John Snow wrote:
>>
>>
>> On 01/30/2018 03:38 AM, Liang Li wrote:
>>> When doing drive mirror to a low speed shared storage, if there was heavy
>>> BLK IO write workload in VM after the 'ready' event, drive mirror block job
>>> can't be canceled immediately, it would keep running until the heavy BLK IO
>>> workload stopped in the VM.
>>>
>>> Because libvirt depends on block-job-cancel for block live migration, the
>>> current block-job-cancel has the semantic to make sure data is in sync after
>>> the 'ready' event.  This semantic can't meet some requirement, for example,
>>> people may use drive mirror for realtime backup while need the ability of
>>> block live migration. If drive mirror can't not be cancelled immediately,
>>> it means block live migration need to wait, because libvirt make use drive
>>> mirror to implement block live migration and only one drive mirror block
>>> job is allowed at the same time for a give block dev.
>>>
>>> We need a new interface for 'force cancel', which could quit block job
>>> immediately if don't care about whether data is in sync or not.
>>>
>>> 'force' is not used by libvirt currently, to make things simple, change
>>> it's semantic slightly, hope it will not break some use case which need its
>>> original semantic.
>>>
>>> Cc: Paolo Bonzini <address@hidden>
>>> Cc: Jeff Cody <address@hidden>
>>> Cc: Kevin Wolf <address@hidden>
>>> Cc: Max Reitz <address@hidden>
>>> Cc: Eric Blake <address@hidden>
>>> Cc: John Snow <address@hidden>
>>> Reported-by: Huaitong Han <address@hidden>
>>> Signed-off-by: Huaitong Han <address@hidden>
>>> Signed-off-by: Liang Li <address@hidden>
>>> ---
>>> block/mirror.c            |  9 +++------
>>> blockdev.c                |  4 ++--
>>> blockjob.c                | 11 ++++++-----
>>> hmp-commands.hx           |  3 ++-
>>> include/block/blockjob.h  |  9 ++++++++-
>>> qapi/block-core.json      |  6 ++++--
>>> tests/test-blockjob-txn.c |  8 ++++----
>>> 7 files changed, 29 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index c9badc1..c22dff9 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -869,11 +869,8 @@ static void coroutine_fn mirror_run(void *opaque)
>>>
>>>         ret = 0;
>>>         trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
>>> -        if (!s->synced) {
>>> -            block_job_sleep_ns(&s->common, delay_ns);
>>> -            if (block_job_is_cancelled(&s->common)) {
>>> -                break;
>>> -            }
>>> +        if (block_job_is_cancelled(&s->common) && s->common.force) {
>>> +            break;
>>
>> what's the justification for removing the sleep in the case that
>> !s->synced && !block_job_is_cancelled(...) ?
>>
> if !block_job_is_cancelled() satisfied, the code in 'if (!should_complete) {}'
> will execute, there is a block_job_sleep_ns there.
> 
> block_job_sleep_ns is for rate throttling, if there is no more data to sync, 
> sleep is not needed, right?
> 
>>>         } else if (!should_complete) {
>>>             delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
>>>             block_job_sleep_ns(&s->common, delay_ns);
>>> @@ -887,7 +884,7 @@ immediate_exit:
>>>          * or it was cancelled prematurely so that we do not guarantee that
>>>          * the target is a copy of the source.
>>>          */
>>> -        assert(ret < 0 || (!s->synced && 
>>> block_job_is_cancelled(&s->common)));
>>> +        assert(ret < 0 || block_job_is_cancelled(&s->common));
>>
>> This assertion gets weaker in the case where force isn't provided, is
>> that desired?
>>
> yes. if force quit is used, the following condition can be true
> 
> (ret >= 0) && (s->synced) && (block_job_is_cancelled(&s->common)) 
> 
> so the above assert should be changed, or it will be failed.
> 

What I mean is that when we *don't* use force quit, the assertion here
is weakened. You can work the force conditional in here:

ret < 0 || (block_job_is_cancelled(job) && (job->force || !s->synced))

which preserves the stricter assertion when force isn't provided.

>>>         assert(need_drain);
>>>         mirror_wait_for_all_io(s);
>>>     }
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 8e977ee..039f156 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -145,7 +145,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>>>         aio_context_acquire(aio_context);
>>>
>>>         if (bs->job) {
>>> -            block_job_cancel(bs->job);
>>> +            block_job_cancel(bs->job, false);
>>>         }
>>>
>>>         aio_context_release(aio_context);
>>> @@ -3802,7 +3802,7 @@ void qmp_block_job_cancel(const char *device,
>>>     }
>>>
>>>     trace_qmp_block_job_cancel(job);
>>> -    block_job_cancel(job);
>>> +    block_job_cancel(job, force);
>>> out:
>>>     aio_context_release(aio_context);
>>> }
>>> diff --git a/blockjob.c b/blockjob.c
>>> index f5cea84..0aacb50 100644
>>> --- a/blockjob.c
>>> +++ b/blockjob.c
>>> @@ -365,7 +365,7 @@ static void block_job_completed_single(BlockJob *job)
>>>     block_job_unref(job);
>>> }
>>>
>>> -static void block_job_cancel_async(BlockJob *job)
>>> +static void block_job_cancel_async(BlockJob *job, bool force)
>>> {
>>>     if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
>>>         block_job_iostatus_reset(job);
>>> @@ -376,6 +376,7 @@ static void block_job_cancel_async(BlockJob *job)
>>>         job->pause_count--;
>>>     }
>>>     job->cancelled = true;
>>> +    job->force = force;
>>> }
>>>
>>
>> I suppose this is okay; we're setting a permanent property of the job as
>> part of a limited operation.
>>
>> Granted, the job should disappear afterwards, so it should never
>> conflict, but it made me wonder for a moment.
>>
>> What happens if we cancel with force = true and then immediately cancel
>> again with force = false? What happens? Can it cause issues?
>>
> 
> Indeed. It can be fixed by:
> 
> if (!job->force)
>    job->force = force
> 
> it's that ok ?
> 

I think so, yes. or job->force |= force, or your preferred equivalent
for only allowing false-->true here.

>>> static int block_job_finish_sync(BlockJob *job,
>>> @@ -437,7 +438,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
>>>      * on the caller, so leave it. */
>>>     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
>>>         if (other_job != job) {
>>> -            block_job_cancel_async(other_job);
>>> +            block_job_cancel_async(other_job, true);
>>
>> What's the rationale for deciding that transactional cancellations are
>> always force=true?
>>
> 
> no particular reason, just try to speed up the abort process. :)
> 

I'd prefer we not change the semantics of how transactions fail without
a stronger justification than wanting to make it faster!

>> Granted, this doesn't matter currently because mirror isn't a
>> transactional job, but that makes the decision all the more curious.
>>
>>>         }
>>>     }
> 
> Thanks for your comments.
> 
> Liang
>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel, John Snow <=
- Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel, Liang Li, 2018/02/05

Prev by Date: Re: [Qemu-block] [Qemu-devel] [PATCH] qemu-img: Make resize error message more general
Next by Date: [Qemu-block] [PATCH 0/2] block/ssh: Implement .bdrv_refresh_filename()
Previous by thread: [Qemu-block] [PATCH] qemu-img: Make resize error message more general
Next by thread: Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel
Index(es):
- Date
- Thread