qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration


From: Markus Armbruster
Subject: Re: [Qemu-devel] [Qemu-block] Dynamic reconfiguration
Date: Wed, 21 Oct 2015 10:27:26 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Sorry for my slow reply.

Kevin Wolf <address@hidden> writes:

> Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben:
>> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
>> > Max Reitz <address@hidden> writes:
>> > > E.g. you may have a block filter in the future where you want to
>> > > exchange its child BDS. This exchange should be an atomic operation, so
>> > > we cannot use this interface there anyway. For quorum, such an exchange
>> > > does not need to be atomic, since you can just add the new child first
>> > > and remove the old one afterwards.
>> > >
>> > > So maybe in the future we get some block driver other than quorum for
>> > > which adding and removing children (as opposed to atomically exchanging
>> > > them) makes sense, but for now I can only see quorum. Therefore, that
>> > > this works for quorum only is in my opinion not a reason to make it
>> > > experimental. I think we actually want to keep it that way.
>> > 
>> > Are you telling us the existing interface is insufficiently general?
>> > That the general interface neeeds to support atomic replacement?
>> > 
>> > If yes, why isn't the general interface is what we should do for quorum?
>> > Delete is atomic replacement by nothing, add is atomic replacement of
>> > nothing.
>> 
>> The general thing is what we used to call "dynamic reconfiguration". If
>> we want a single command to enable it (which would be great if we
>> could), we need to some more design work first. Atomic replacement might
>> be the operation we're looking for, but we have to be sure.
>> 
>> So far we haven't thought about dynamic reconfiguation enough that we
>> would know the right solution, but enough that we know it's hard.

Yes.

>>                                                                   That
>> would be an argument for me that makes adding an x-* command now
>> acceptable. On the other hand, the fact that we want a single command in
>> the end makes me want to keep it experimental.

We have a bit more time to think until the release calcifies interfaces.

>> What types of dynamic reconfiguration do we need to support? I'll start
>> with a small list, feel free to extend it:
>> 
>> * Replace a child node with another node. This works pretty much
>>   everywhere in the tree - including the root, i.e. BlockBackend!
>>   Working just on BDSes doesn't seem to be enough.

Two ways to replace a node:

1. Operation on a node, i.e. replace node O by node N *everywhere*.

2. Operation on an edge, i.e. replace an edge to node O by an edge to
   node N.

Less abstract:

1. Node replacement takes a BDS *O.  It either replaces in place
   (something we've worked hard to get rid of), or needs to find all
   variables containing the value O.

2. Edge replacement takes a BDS **O.  Naturally, whoever contains that
   thing may need to be notified, so the actual interface may well be a
   bit more involved.

>> * Adding a child to a node that can take additional children (e.g.
>>   quorum can take an arbitrary number; but also COW image formats have
>>   an option child, so you could add a backing file to a node originally
>>   opened with BDRV_O_NO_BACKING)
>> 
>>   Same as atomically replacing nothing by a node.

Yes.

>> * Removing an optional child from a node that remains valid with that
>>   child removed. The same examples apply.
>> 
>>   Same as atomically replacing a child by nothing.

Yes.

>> * Add a new node between two existing nodes. An example is taking a live
>>   snapshot, which inserts a new node between the BlockBackend and the
>>   first BDS. Or it could be used to insert a filter somewhere in the
>>   graph.
>> 
>>   Same as creating the new node pointing to node B (or dynamically
>>   adding it) and then atomically replacing the reference of node A that
>>   pointed to B with a reference to the new node.

Footnote: between creation of the new node and the atomic replace, the
old node has another parent.  Requires support for a BDS to have
multiple parents, but that's a given, I think.

>> * Add a new node between multiple existing nodes. This is one of the
>>   examples we always used to discuss with dynamic reconfiguration:
>> 
>>       base <- sn1 <- sn2 <--+-- virtio-blk
>>                             |
>>                             +-- NBD server
>> 
>>   Adding a filter could result in this:
>> 
>>         base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
>>                                           |
>>                                           +-- NBD server
>> 
>>   Or this:
>> 
>>         base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
>>                               |
>>                               +-- NBD server
>> 
>>   Or this:
>> 
>>         base <- sn1 <- sn2 <--+-- virtio-blk
>>                               |
>>                               +-- throttling <- NBD server
>> 
>>   All of these are different kinds of "adding a filter", and all of
>>   them are valid operations that a user could want to perform.
>> 
>>   Case 2 and 3 are really just "add a new node between two existing
>>   nodes", as explained above.

Yes.

>>   Case 1 is new: We still create the throttling node so that it already
>>   points to sn2, but now we have to atomically change the children of
>>   two things (the BlockBackends of virtio-blk and the NBD server). Not a
>>   problem per se because we can just do that, but it raises the question
>>   whether the atomic replacement operation needs to be transactionable.

Where "transactionable" means you can bind it together with other
transactionable operations, and the resulting transaction either
succeeds or fails completely.  Right?

In increasing flexibility order:

a. atomic single replacement, not transactionable

b. atomic multiple replacements, not transactionable

c. single replacement, transactionable

In your throttling example, having to insert the throttle in two
separate operations (either of which can fail) seems tolerable.

Can we think of an example where a similar multi-replacement should
either succeed or fail completely?  I.e. one that needs at least b.

Can we think of an example where we need c?

>> * Remove a node between two (or more) other nodes.
>> 
>>   Same as atomically replacing a child by a grandchild. For more than
>>   two involved nodes, again a transactional version might be needed.
>> 
>> So at the first sight, this operation seems to work as the general
>> interface for dynamic reconfiguration.

Yes, we seem to be onto something here.  Finally!

> And actually, after re-reading the other branch of this email thread
> where I replied to Berto that he wants to use bdrv_reopen() to change
> the voting threshold for quorum, it occurred to me that bdrv_reopen()
> should possibly also be this atomatic replacement function.
>
> After all, the references to other nodes are blockdev-add options, and
> if we implement bdrv_reopen() fully so that it can change any options
> that can be given to blockdev-add, that means that we must be able to
> replace children.

You got a point.  Do you think implementing it will be hard?

>
> Kevin
>
>> One problem we discussed earlier that I'm not sure whether it's related
>> is filter nodes inserted automatically once we change e.g. the I/O
>> throttling QMP commands to add a throttling filter BDS to the graph.
>> 
>> If the user creates nodes A and B, but sets throttling options, they
>> might end up with a chain like this:
>> 
>>     A <- throttling <- B

Is "sets throttling options" == "uses the (then) legacy interface for
throttling instead of the more powerful 'insert throttling node'
interface"?

>> Now imagine that they want to add another filter between A and B, let's
>> say blkdebug. They would need to know that they have to insert the new
>> node between A and throttling or B and throttling, but not between A and
>> B. If they tried to insert it between A and B, the algorithm above says
>> that they would let blkdebug point to A, and replace B's child with
>> blkdebug, the resulting tree wouldn't be throttled any more!
>> 
>>     A <- blkdebug <- B
>> 
>>          throttling (ref = 0 -> delete)

Can we phrase the operation differently?  Instead of "insert between A
and B (silently replacing everything that is now between A and B)",
say one of

1a. Replace node A by A <- blkdebug

1b. Replace node B by blkdebug <- B

2a. Replace edge A <- B by <- blkdebug <-
    Impossible in the current state, because there is no such edge.

2b. Replace edge A <- by <- blkdebug <-

2c. Replace edge <- B by <- blkdebug <-

See also node vs. edge replacement above.

>> Even if they knew that they have to consider the throttling node, it
>> currently wouldn't have a node-name, and with Jeff's autogenerated names
>> it wouldn't be predictable.
>> 
>> Maybe the dynamic reconfiguration interface does need to be a bit
>> cleverer.
>> 
>> 
>> Anyway, after writing all of this, I'm almost convinced now that an
>> experimental interface is the right thing to do in this patch series.

Let's do experimental now, and reconsider before the release.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]