qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH v3 1/9] block: add .bdrv_need_rw_file_child_duri


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-block] [PATCH v3 1/9] block: add .bdrv_need_rw_file_child_during_reopen_rw handler
Date: Thu, 1 Aug 2019 14:02:08 +0000

31.07.2019 15:09, Max Reitz wrote:
> On 25.07.19 11:18, Vladimir Sementsov-Ogievskiy wrote:
>> On reopen to rw parent may need rw access to child in .prepare, for
>> example qcow2 needs to write IN_USE flags into stored bitmaps
>> (currently it is done in a hacky way after commit and don't work).
>> So, let's introduce such logic.
>>
>> The drawback is that in worst case bdrv_reopen_set_read_only may finish
>> with error and in some intermediate state: some nodes reopened RW and
>> some are not. But this is a way to fix bug around reopening qcow2
>> bitmaps in the following commits.
> 
> This commit message doesn’t really explain what this patch does.
> 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>> ---
>>   include/block/block_int.h |   2 +
>>   block.c                   | 144 ++++++++++++++++++++++++++++++++++----
>>   2 files changed, 133 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 3aa1e832a8..7bd6fd68dd 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -531,6 +531,8 @@ struct BlockDriver {
>>                                uint64_t parent_perm, uint64_t parent_shared,
>>                                uint64_t *nperm, uint64_t *nshared);
>>   
>> +     bool (*bdrv_need_rw_file_child_during_reopen_rw)(BlockDriverState *bs);
>> +
>>       /**
>>        * Bitmaps should be marked as 'IN_USE' in the image on reopening image
>>        * as rw. This handler should realize it. It also should unset readonly
>> diff --git a/block.c b/block.c
>> index cbd8da5f3b..3c8e1c59b4 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1715,10 +1715,12 @@ static void 
>> bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
>>                                        uint64_t *shared_perm);
>>   
>>   typedef struct BlockReopenQueueEntry {
>> -     bool prepared;
>> -     bool perms_checked;
>> -     BDRVReopenState state;
>> -     QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
>> +    bool reopened_file_child_rw;
>> +    bool changed_file_child_perm_rw;
>> +    bool prepared;
>> +    bool perms_checked;
>> +    BDRVReopenState state;
>> +    QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
>>   } BlockReopenQueueEntry;
>>   
>>   /*
>> @@ -3421,6 +3423,105 @@ BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue 
>> *bs_queue,
>>                                      keep_old_opts);
>>   }
>>   
>> +static int bdrv_reopen_set_read_only_drained(BlockDriverState *bs,
>> +                                             bool read_only,
>> +                                             Error **errp)
>> +{
>> +    BlockReopenQueue *queue;
>> +    QDict *opts = qdict_new();
>> +
>> +    qdict_put_bool(opts, BDRV_OPT_READ_ONLY, read_only);
>> +
>> +    queue = bdrv_reopen_queue(NULL, bs, opts, true);
>> +
>> +    return bdrv_reopen_multiple(queue, errp);
>> +}
>> +
>> +/*
>> + * handle_recursive_reopens
>> + *
>> + * On fail it needs rollback_recursive_reopens to be called.
> 
> It would be nice if this description actually said anything about what
> the function is supposed to do.
> 
>> + */
>> +static int handle_recursive_reopens(BlockReopenQueueEntry *current,
>> +                                    Error **errp)
>> +{
>> +    int ret;
>> +    BlockDriverState *bs = current->state.bs;
>> +
>> +    /*
>> +     * We use the fact that in reopen-queue children are always following
>> +     * parents.
>> +     * TODO: Switch BlockReopenQueue to be QTAILQ and use
>> +     *       QTAILQ_FOREACH_REVERSE.
> 
> Why don’t you do that first?  It would make the code more obvious at
> least to me.
> 
>> +     */
>> +    if (QSIMPLEQ_NEXT(current, entry)) {
>> +        ret = handle_recursive_reopens(QSIMPLEQ_NEXT(current, entry), errp);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    if ((current->state.flags & BDRV_O_RDWR) && bs->file && bs->drv &&
>> +        bs->drv->bdrv_need_rw_file_child_during_reopen_rw &&
>> +        bs->drv->bdrv_need_rw_file_child_during_reopen_rw(bs))
>> +    {
>> +        if (!bdrv_is_writable(bs->file->bs)) {
>> +            ret = bdrv_reopen_set_read_only_drained(bs->file->bs, false, 
>> errp);
> 
> Hm.  Sorry, I find this all a bit hard to understand.  (No comments and
> all.)
> 
> I understand that this is for an RO -> RW transition?  Everything is
> still RO, but the parent will need an RW child before it transitions to
> RW itself.
> 
> 
> I’m going to be honest up front, I don’t like this very much.  But I
> think it may be a reasonable solution for now.
> 
> As I remember, the problem was that when reopening a qcow2 node from RO
> to RW, we need to write something in .prepare() (because it can fail),
> but naturally no .prepare() is called after any .commit(), so no matter
> the order of nodes in the ReopenQueue, the child node will never be RW
> by this point.


Exactly.

> 
> Hm.  To me that mostly means that making the whole reopen process a
> transaction was just a dream that turns out not to work.
> 
> OK, so what would be the real, proper, all-encompassing fix?  I suppose
> we’d need a way to express reopen dependency relationships.  So if a
> node depends on one or more of its children to be reopened before/after
> it can be reopened itself, we’d need to pull them out of the ReopenQueue
> (along with their dependencies) and do a separate bdrv_reopen_multiple()
> on them.  So we’d want to split the ReopenQueue into as few subqueues as
> possible, so that all dependencies are satisfied.


Hmm. But actully, considering our case as example, reopening qcow2 RO->RW,
in general not dependent on reopening file child fist, but on just RW file
child.. As well as reopening qcow2 RW->RO depends on file child to be RW.

But I agree, that we'd better not doing reopens different from what we have
in the queue.

> 
> One such dependency is if you change a node from RO to RW, and that
> change requires an RW child.

Yes.

> 
> The reverse dependency occurs is if you change a node from RW to RO, and
> the nodes wants to write something to its child, so it needs to remain
> RW until then.

And yes, but actually this is the same dependency, but it should be resolved
in other way.

> 
> Currently, the former is just broken (hence this patch).  The latter is
> kind of addressed by virtue of “Writing happens in .prepare(), and
> parents come before children in the ReopenQueue”.
> 
> 
> OK, so you address one specific case of a dependency, namely of a node
> on its bs->file when it comes to writableness.  Not too bad, supporting
> bs->file as the only dependency makes sense.  Everything else would
> become complicated.
> 
> 
> Besides being more specific than the general solution I tried to sketch
> above, there is one more difference, though: In that description, I said
> we should remove the node and its dependencies from the ReopenQueue, and
> commit it earlier.  That has three implicatons:
> 
> First, this patch reopens the file if it is not writable, but the parent
> needs it to be writable.  I think that’s wrong.  We should take the
> file’s entry of the ReopenQueue and reopen it accordingly, not blindly
> reopen it RW.  (If the user didn’t specify the file to be reopened RW,
> that should be an error.)
> 
> Second, you need to take the dependencies into account.  (I don’t know
> whether this one is a problem in practice.)  If the file node itself has
> a child that needs to be RW, then you need to take that from the
> ReopenQueue, too, and repoen it RW, too.
> 
> Third, the reopen may require some other options on the file.
> Temporarily reopening it RW without changing those options may not be
> what the user wants.  (So another reason to take the existing
> ReopenQueue entry.)
> 
> 
> So -- without having tried, of course -- I think a better design would
> be to look for bs->file->bs in the ReopenQueue, recursively all of its
> children, and move all of those entries into a new queue, and then
> invoke bdrv_reopen_multiple() on that one first.

Why all children recursively? Shouldn't we instead only follow defined
dependencies?
Or it just seems bad to put not full subtree into bdrv_reopen multiple() ?


> 
> The question then becomes how to roll back those changes...  I don’t
> know whether just having bdrv_reopen() partially succeed is so bad.
> Otherwise, we’d need a function to translate an existing node's state
> into a BdrvReopenQueueEntry so we can thus return to the old state.

And this rollback may fail too and we are still in partial success state.

But if we roll-back simple ro->rw reopen it's safe enough: in worst case
file will be rw, but marked ro, so Qemu may have more access rights than
needed but will not use them, this is why I was concentrating around
only ro->rw case..

So, what about go similar way to this patch, i.e. only reopen ro->rw children
which we need to be rw, not touching other flags, but check, that in reopen
queue we have this child, and it is going to be reopened RW, and if not,
return error immediately?

> 
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +            current->reopened_file_child_rw = true;
>> +        }
>> +
>> +        if (!(bs->file->perm & BLK_PERM_WRITE)) {
>> +            ret = bdrv_child_try_set_perm(bs->file,
>> +                                          bs->file->perm | BLK_PERM_WRITE,
>> +                                          bs->file->shared_perm, errp);
> 
> bdrv_child_try_set_perm() is dangerous, because its effect will be
> undone whenever something happens that causes the permissions to be
> refreshed.  (Hence the comment in block_int.h which says to avoid it.)
> Generally, bdrv_child_refresh_perms() should be enough.  If it isn’t,
> I’d say something’s off.
> 
> Max
> 
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +            current->changed_file_child_perm_rw = true;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
> 


-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]