qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v4 04/15] block/commit: refactor commit to use j


From: Max Reitz
Subject: Re: [Qemu-block] [PATCH v4 04/15] block/commit: refactor commit to use job callbacks
Date: Wed, 5 Sep 2018 13:37:56 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 2018-09-05 12:49, Kevin Wolf wrote:
> Am 05.09.2018 um 12:27 hat Max Reitz geschrieben:
>> On 2018-09-04 22:32, John Snow wrote:
>>>
>>>
>>> On 09/04/2018 02:46 PM, Jeff Cody wrote:
>>>> On Tue, Sep 04, 2018 at 01:09:19PM -0400, John Snow wrote:
>>>>> Use the component callbacks; prepare, abort, and clean.
>>>>>
>>>>> NB: prepare is only called when the job has not yet failed;
>>>>> and abort can be called after prepare.
>>>>>
>>>>> complete -> prepare -> abort -> clean
>>>>> complete -> abort -> clean
>>>>>
>>>>> Signed-off-by: John Snow <address@hidden>
>>>>> Reviewed-by: Max Reitz <address@hidden>
>>>>> ---
>>>>>  block/commit.c | 90 
>>>>> ++++++++++++++++++++++++++++++++--------------------------
>>>>>  1 file changed, 49 insertions(+), 41 deletions(-)
>>>>>
>>>>> diff --git a/block/commit.c b/block/commit.c
>>>>> index b6e8969877..eb3941e545 100644
>>>>> --- a/block/commit.c
>>>>> +++ b/block/commit.c
>>>>> @@ -36,6 +36,7 @@ typedef struct CommitBlockJob {
>>>>>      BlockDriverState *commit_top_bs;
>>>>>      BlockBackend *top;
>>>>>      BlockBackend *base;
>>>>> +    BlockDriverState *base_bs;
>>>>>      BlockdevOnError on_error;
>>>>>      int base_flags;
>>>>>      char *backing_file_str;
>>>>> @@ -68,61 +69,65 @@ static int coroutine_fn commit_populate(BlockBackend 
>>>>> *bs, BlockBackend *base,
>>>>>      return 0;
>>>>>  }
>>>>>  
>>>>> -static void commit_exit(Job *job)
>>>>> +static int commit_prepare(Job *job)
>>>>>  {
>>>>>      CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
>>>>> -    BlockJob *bjob = &s->common;
>>>>> -    BlockDriverState *top = blk_bs(s->top);
>>>>> -    BlockDriverState *base = blk_bs(s->base);
>>>>> -    BlockDriverState *commit_top_bs = s->commit_top_bs;
>>>>> -    bool remove_commit_top_bs = false;
>>>>> -
>>>>> -    /* Make sure commit_top_bs and top stay around until 
>>>>> bdrv_replace_node() */
>>>>> -    bdrv_ref(top);
>>>>> -    bdrv_ref(commit_top_bs);
>>>>>  
>>>>>      /* Remove base node parent that still uses BLK_PERM_WRITE/RESIZE 
>>>>> before
>>>>>       * the normal backing chain can be restored. */
>>>>>      blk_unref(s->base);
>>>>> +    s->base = NULL;
>>>>>  
>>>>> -    if (!job_is_cancelled(job) && job->ret == 0) {
>>>>> -        /* success */
>>>>> -        job->ret = bdrv_drop_intermediate(s->commit_top_bs, base,
>>>>> -                                          s->backing_file_str);
>>>>> -    } else {
>>>>> -        /* XXX Can (or should) we somehow keep 'consistent read' blocked 
>>>>> even
>>>>> -         * after the failed/cancelled commit job is gone? If we already 
>>>>> wrote
>>>>> -         * something to base, the intermediate images aren't valid any 
>>>>> more. */
>>>>> -        remove_commit_top_bs = true;
>>>>> +    return bdrv_drop_intermediate(s->commit_top_bs, s->base_bs,
>>>>> +                                  s->backing_file_str);
>>>>> +}
>>>>
>>>> If we can go from prepare->abort->clean, then that means to me that every
>>>> failure case of .prepare() can be resolved without permanent changes / data
>>>> loss.  Is this necessarily the case?
>>>>
>>>
>>> That'd be a requisite to make the job a transaction, but commit, mirror
>>> and stream are not currently transactionable.
>>
>> Is that already documented anywhere?
>>
>> (Otherwise I'd be afraid of us forgetting in like a year, asking "Why
>> isn't this a transaction already?", just making it one, and then
>> remembering half a year later.)
>>
>>> The way commit already works, for example, can leave the base and
>>> intermediate images as unusable as standalone images. This refactoring
>>> will not change that alone.
>>>
>>> So it's not necessarily a problem, but it's something that would need to
>>> be fixed if we ever wanted transaction support.
>>>
>>> However, in talking on IRC we did realize that this patch does change
>>> behavior...
>>>
>>> Before:
>>>
>>> If bdrv_drop_intermediate fails, we store the retcode but continue
>>> cleaning up as if it didn't fail. i.e., we don't remove the commit job's
>>> installed top_bs node.
>>>
>>> After:
>>>
>>> if bdrv_drop_intermediate fails, we return the failure retcode and
>>> .abort gets called as a result, i.e. we will remove the commit job's
>>> installed top_bs node in favor of the original top_bs node.
>>>
>>> I think this behavior is an improvement,
>>
>> I agree.
>>
>>> however it raises a question
>>> about the nature of failures in bdrv_drop_intermediate.
>>>
>>> If this function fails without making any changes, the new commit
>>> behavior is good. If it succeeds, we're also good. The problem is with
>>> intermediate or partial successes.
>>>
>>> If top has multiple parents (I think under normal circumstances it
>>> won't, but I'm not absolutely sure) and it fails to update their backing
>>> file references, it might partially succeed.
>>>
>>> I think commit's usage here is correct, but I think we might need to
>>> update bdrv_drop_intermediate to make it roll back changes if it
>>> experiences a partial failure to give all-or-nothing semantics.
>>
>> Sure, that would be good.
>>
>>> Thoughts?
>>
>> We could start by calling bdrv_check_update_perm() on all parents before
>> doing any changes.  Then the roll back would consist only of invoking
>> bdrv_abort_perm_update() and in theory reverting the
>> c->update_filename() changes.
>>
>> In practice...  How do we want to revert c->update_filename()?  There
>> currently is no way of getting the old value.  (And just using the old
>> child's filename may well be wrong, because the old child might not be
>> the one referenced by the image header.)
>>
>> I have three ideas:
>> 1) We could introduce a way of getting the old filename the parent has,
>> so we can restore it.
>>
>> 2) We could make .update_filename() kind of transactionable (seems like
>> overkill, but it would be easier in practice, I think).
>>
>> 3) We basically ignore .update_filename() errors.  We'd still return
>> them, but we don't abort the graph change operation.  So after
>> bdrv_drop_intermediate() is done, the graph has been changed
>> succesfully, or it hasn't changed at all -- whether the filename updates
>> all went through, that's a different story.
>>
>> #3 would be the simplest solution.  It's a bit stupid, but it would work
>> for most problems, I think; at least the callers would know that the
>> graph is in exactly one of two well-defined states.
> 
> Option 2 sounds nice in theory, but how do you make things
> transactionable when they require writing to an image? Either you make
> the change or you don't. If you make it and later notice that you
> shouldn't have done so, you can try to write the old value back, but
> that can fail, too.

Yes, that's what I mean by "kind of".  It's just #1 with the code moved
somewhere else.

> So making .update_filename() transactionable is probably not feasible.
> The choice that is left is whether we update it in .prepare (and
> continue with .abort if it fails) or in .commit (and ignore errors).

My idea was "if we could update it one time, we can probably update it a
second time", so reverting in .abort() would usually work.  I know that
a "usually" is not a "definitely", so that's the "kind of" again.

So the abort could still throw an error which we would have to handle
like in #3 anyway, I suppose.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]