[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration
From: |
Wen Congyang |
Subject: |
Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration |
Date: |
Mon, 26 Oct 2015 10:04:31 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 |
On 10/21/2015 04:27 PM, Markus Armbruster wrote:
> Sorry for my slow reply.
>
> Kevin Wolf <address@hidden> writes:
>
>> Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben:
>>> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
>>>> Max Reitz <address@hidden> writes:
>>>>> E.g. you may have a block filter in the future where you want to
>>>>> exchange its child BDS. This exchange should be an atomic operation, so
>>>>> we cannot use this interface there anyway. For quorum, such an exchange
>>>>> does not need to be atomic, since you can just add the new child first
>>>>> and remove the old one afterwards.
>>>>>
>>>>> So maybe in the future we get some block driver other than quorum for
>>>>> which adding and removing children (as opposed to atomically exchanging
>>>>> them) makes sense, but for now I can only see quorum. Therefore, that
>>>>> this works for quorum only is in my opinion not a reason to make it
>>>>> experimental. I think we actually want to keep it that way.
>>>>
>>>> Are you telling us the existing interface is insufficiently general?
>>>> That the general interface neeeds to support atomic replacement?
>>>>
>>>> If yes, why isn't the general interface is what we should do for quorum?
>>>> Delete is atomic replacement by nothing, add is atomic replacement of
>>>> nothing.
>>>
>>> The general thing is what we used to call "dynamic reconfiguration". If
>>> we want a single command to enable it (which would be great if we
>>> could), we need to some more design work first. Atomic replacement might
>>> be the operation we're looking for, but we have to be sure.
>>>
>>> So far we haven't thought about dynamic reconfiguation enough that we
>>> would know the right solution, but enough that we know it's hard.
>
> Yes.
>
>>> That
>>> would be an argument for me that makes adding an x-* command now
>>> acceptable. On the other hand, the fact that we want a single command in
>>> the end makes me want to keep it experimental.
>
> We have a bit more time to think until the release calcifies interfaces.
>
>>> What types of dynamic reconfiguration do we need to support? I'll start
>>> with a small list, feel free to extend it:
>>>
>>> * Replace a child node with another node. This works pretty much
>>> everywhere in the tree - including the root, i.e. BlockBackend!
>>> Working just on BDSes doesn't seem to be enough.
>
> Two ways to replace a node:
>
> 1. Operation on a node, i.e. replace node O by node N *everywhere*.
>
> 2. Operation on an edge, i.e. replace an edge to node O by an edge to
> node N.
>
> Less abstract:
>
> 1. Node replacement takes a BDS *O. It either replaces in place
> (something we've worked hard to get rid of), or needs to find all
> variables containing the value O.
>
> 2. Edge replacement takes a BDS **O. Naturally, whoever contains that
> thing may need to be notified, so the actual interface may well be a
> bit more involved.
>
>>> * Adding a child to a node that can take additional children (e.g.
>>> quorum can take an arbitrary number; but also COW image formats have
>>> an option child, so you could add a backing file to a node originally
>>> opened with BDRV_O_NO_BACKING)
>>>
>>> Same as atomically replacing nothing by a node.
>
> Yes.
>
>>> * Removing an optional child from a node that remains valid with that
>>> child removed. The same examples apply.
>>>
>>> Same as atomically replacing a child by nothing.
>
> Yes.
>
>>> * Add a new node between two existing nodes. An example is taking a live
>>> snapshot, which inserts a new node between the BlockBackend and the
>>> first BDS. Or it could be used to insert a filter somewhere in the
>>> graph.
>>>
>>> Same as creating the new node pointing to node B (or dynamically
>>> adding it) and then atomically replacing the reference of node A that
>>> pointed to B with a reference to the new node.
>
> Footnote: between creation of the new node and the atomic replace, the
> old node has another parent. Requires support for a BDS to have
> multiple parents, but that's a given, I think.
>
>>> * Add a new node between multiple existing nodes. This is one of the
>>> examples we always used to discuss with dynamic reconfiguration:
>>>
>>> base <- sn1 <- sn2 <--+-- virtio-blk
>>> |
>>> +-- NBD server
>>>
>>> Adding a filter could result in this:
>>>
>>> base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
>>> |
>>> +-- NBD server
>>>
>>> Or this:
>>>
>>> base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
>>> |
>>> +-- NBD server
>>>
>>> Or this:
>>>
>>> base <- sn1 <- sn2 <--+-- virtio-blk
>>> |
>>> +-- throttling <- NBD server
>>>
>>> All of these are different kinds of "adding a filter", and all of
>>> them are valid operations that a user could want to perform.
>>>
>>> Case 2 and 3 are really just "add a new node between two existing
>>> nodes", as explained above.
>
> Yes.
>
>>> Case 1 is new: We still create the throttling node so that it already
>>> points to sn2, but now we have to atomically change the children of
>>> two things (the BlockBackends of virtio-blk and the NBD server). Not a
>>> problem per se because we can just do that, but it raises the question
>>> whether the atomic replacement operation needs to be transactionable.
>
> Where "transactionable" means you can bind it together with other
> transactionable operations, and the resulting transaction either
> succeeds or fails completely. Right?
>
> In increasing flexibility order:
>
> a. atomic single replacement, not transactionable
>
> b. atomic multiple replacements, not transactionable
>
> c. single replacement, transactionable
>
> In your throttling example, having to insert the throttle in two
> separate operations (either of which can fail) seems tolerable.
>
> Can we think of an example where a similar multi-replacement should
> either succeed or fail completely? I.e. one that needs at least b.
>
> Can we think of an example where we need c?
>
>>> * Remove a node between two (or more) other nodes.
>>>
>>> Same as atomically replacing a child by a grandchild. For more than
>>> two involved nodes, again a transactional version might be needed.
>>>
>>> So at the first sight, this operation seems to work as the general
>>> interface for dynamic reconfiguration.
>
> Yes, we seem to be onto something here. Finally!
>
>> And actually, after re-reading the other branch of this email thread
>> where I replied to Berto that he wants to use bdrv_reopen() to change
>> the voting threshold for quorum, it occurred to me that bdrv_reopen()
>> should possibly also be this atomatic replacement function.
>>
>> After all, the references to other nodes are blockdev-add options, and
>> if we implement bdrv_reopen() fully so that it can change any options
>> that can be given to blockdev-add, that means that we must be able to
>> replace children.
>
> You got a point. Do you think implementing it will be hard?
>
>>
>> Kevin
>>
>>> One problem we discussed earlier that I'm not sure whether it's related
>>> is filter nodes inserted automatically once we change e.g. the I/O
>>> throttling QMP commands to add a throttling filter BDS to the graph.
>>>
>>> If the user creates nodes A and B, but sets throttling options, they
>>> might end up with a chain like this:
>>>
>>> A <- throttling <- B
>
> Is "sets throttling options" == "uses the (then) legacy interface for
> throttling instead of the more powerful 'insert throttling node'
> interface"?
>
>>> Now imagine that they want to add another filter between A and B, let's
>>> say blkdebug. They would need to know that they have to insert the new
>>> node between A and throttling or B and throttling, but not between A and
>>> B. If they tried to insert it between A and B, the algorithm above says
>>> that they would let blkdebug point to A, and replace B's child with
>>> blkdebug, the resulting tree wouldn't be throttled any more!
>>>
>>> A <- blkdebug <- B
>>>
>>> throttling (ref = 0 -> delete)
>
> Can we phrase the operation differently? Instead of "insert between A
> and B (silently replacing everything that is now between A and B)",
> say one of
>
> 1a. Replace node A by A <- blkdebug
>
> 1b. Replace node B by blkdebug <- B
>
> 2a. Replace edge A <- B by <- blkdebug <-
> Impossible in the current state, because there is no such edge.
What does 'edge' mean?
Thanks
Wen Congyang
>
> 2b. Replace edge A <- by <- blkdebug <-
>
> 2c. Replace edge <- B by <- blkdebug <-
>
> See also node vs. edge replacement above.
>
>>> Even if they knew that they have to consider the throttling node, it
>>> currently wouldn't have a node-name, and with Jeff's autogenerated names
>>> it wouldn't be predictable.
>>>
>>> Maybe the dynamic reconfiguration interface does need to be a bit
>>> cleverer.
>>>
>>>
>>> Anyway, after writing all of this, I'm almost convinced now that an
>>> experimental interface is the right thing to do in this patch series.
>
> Let's do experimental now, and reconsider before the release.
>
> .
>
Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child, Alberto Garcia, 2015/10/09
Re: [Qemu-devel] [Qemu-block] [PATCH v5 3/4] qmp: add monitor command to add/remove a child, Max Reitz, 2015/10/09