qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Block Filters


From: Fam Zheng
Subject: Re: [Qemu-devel] Block Filters
Date: Fri, 6 Sep 2013 15:56:06 +0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, 09/03 18:24, Benoît Canet wrote:
> 
> Hello list,
> 
> I am thinking about QEMU block filters lately.
> 
> I am not a block.c/blockdev.c expert so tell me what you think of the 
> following.
> 
> The use cases I see would be:
> 
> -$user want to have some real cryptography on top of qcow2/qed or another
> format.
>  snapshots and other block features should continue to work
> 
> -$user want to use a raid like feature like QUORUM in QEMU.
>  other features should continue to work
> 
> -$user want to use the future SSD deduplication implementation with metadata 
> on
> SSD and data on spinning disks.
>  other features should continue to work
> 
> -$user want to I/O throttle one drive of his vm.
> 
> -$user want to do Copy On Read
> 
> -$user want to do a combination of the above
> 
> -$developer want to make the minimum of required steps to keep changes small
> 
> -$developer want to keep user interface changes for later
> 
> Lets take a example case of an user wanting to do I/O throttled encrypted 
> QUORUM
> on top of QCOW2.
> 
> Assuming we want to implement throttle and encryption as something remotely
> being like a block filter this makes a pretty complex BlockDriverState tree.
> 
> The tree would look like the following:
> 
>                     I/O throttling BlockDriverState (bs)
>                                |
>                                |
>                                |
>                                |
>                     Encryption BlockDriverState (bs)
>                                |
>                                |
>                                |
>                                |
>                     Quorum BlockDriverState (bs)
>                    /           |           \
>                   /            |            \
>                  /             |             \
>                 /              |              \
>             QCOW2 bs       QCOW2 b s       QCOW2 bs
>                |               |               |
>                |               |               |
>                |               |               |
>                |               |               |
>             RAW bs         RAW bs           RAW bs
> 
> An external snapshot should result in a tree like the following.
>                     I/O throttling BlockDriverState (bs)
>                                |
>                                |
>                                |
>                                |
>                     Encryption BlockDriverState (bs)
>                                |
>                                |
>                                |
>                                |
>                     Quorum BlockDriverState (bs)
>                    /           |           \
>                   /            |            \
>                  /             |             \
>                 /              |              \
>             QCOW2 bs       QCOW2 bs         QCOW2 bs
>                |               |               |
>                |               |               |
>                |               |               |
>                |               |               |
>             QCOW2 bs       QCOW2 bs         QCOW2 bs
>                |               |               |
>                |               |               |
>                |               |               |
>                |               |               |
>             RAW bs         RAW bs           RAW bs
> 
> In the current state of QEMU we can code some block drivers to implement this
> tree.
> 
> However when doing operations like snapshots blockdev.c would have no real 
> idea
> of what should be snapshotted and how. (The 3 top bs should be kept on top)
> 
> Moreover it would have no way to manipulate easily this tree of 
> BlockDriverState
> has each one is encapsulated in it's parent.
> 
> Also there no generic way to tell the block layer that two or more 
> BlockDriverState
> are siblings.
> 
> The current mail is here to propose some additionals structures in order to 
> cope
> with these problems.
> 
> The overall strategy of the proposed structures is to push out the
> BlockDriverStates relationships out of each BlockDriverState.
> 
> The idea is that it would make it easier for the block layer to manipulate a
> well known structure instead of being forced to enter into each 
> BlockDriverState
> specificity.
> 
> The first structure is the BlockStackNode.
> 
> The BlockStateNode would be used to represent the relationship between the
> various BlockDriverStates
> 
> struct BlockStackNode {
>     BlockDriverState *bs;  /* the BlockDriverState holded by this node */
> 
>     /* this doubly linked list entry points to the child node and the parent
>      * node
>      */
>     QLIST_ENTRY(BlockStateNode) down;
> 
>     /* This doubly linked list entry point to the siblings of this node
>      */
>     QLIST_ENTRY(BlockStateNode) siblings;
> 
>     /* a hash or an array of the sibbling of this node for fast access
>      * should be recomputed when updating the tree */
>     QHASH_ENTRY<BlockStateNode, index> sibblings_hash;
> }
> 
> The BlockBackend would be the structure used to hold the "drive" the guest 
> use.
> 
> struct BlockBackend {
>     /* the following doubly linked list header point to the top BlockStackNode
>      * in our case it's the one containing the I/O throttling bs
>      */
>     QLIST_HEAD(, BlockStateNode) block_stack_head;
>     /* this is a pointer to the topest node below the block filter chain
>      * in our case the first QCOW2 sibling
>      */
>     BlockStackNode *top_node_below_filters;
> }
> 
> 
> Updated diagram:
> 
> (Here bsn means BlockStacknode)
> 
>     ------------------------BlockBackend
>     |                             |
>     |                          block_stack_head
>     |                             |
>     |                             |
>     |                       I/O throttling BlockStackNode (contains it's bs)
>     |                             |
>     |                            down
>     |                             |
>     |                             |
> top_node_below_filter     Encryption BlockStacknode (contains it's bs)
>     |                             |
>     |                            down
>     |                             |
>     |                             |
>     |                Quorum BlockStackNode (contain's it's bs)
>     |               /
>     |             down
>     |             /               
>     |            /     S              S
>     ------  QCOW2 bsn--i---QCOW2 bsn--i------ QCOW2 bsn (each bsn contains a 
> bs)
>                |       b       |      b         |
>              down      l      down    l        down
>                |       i       |      i         |
>                |       n       |      n         |
>                |       g       |      g         |
>                |       s       |      s         |
>                |               |                |
>             RAW bsn         RAW bsn           RAW bsn  (each bsn contains a 
> bs)
> 
> 
> Block driver point of view:
> 
> to construct the tree each BlockDriver would have some utility functions 
> looking
> like.
> 
> bdrv_register_child_bs(bs, child_bs, int index);
> 
> multiples calls to this function could be done to register multiple siblings
> childs identified by their index.
> 
> This way something like quorum could register multiple QCOW2 instances.
> 
> driver would have a
> BlockDriverSTate *bdrv_access_child(bs, int index);
> 
> to access their childs.
> 
> These functions can be implemented without the driver knowing about
> BlockStateNodes using container_of.
> 
> blockdev point of view: (here I need your help)
> 
> When doing a snapshot blockdev.c would access
> BlockBackend->top_node_below_filter and make a snapshot of the bs contained in
> this node and it's sibblings.
> 
Since BlockDriver.bdrv_snapshot_create() is an optional operation, blockdev.c
can navigate down the tree from top node, until hitting some layer where the op
is implemented (the QCow2 bs), so we get rid of this top_node_below_filter
pointer.

Is this the only use case of top_node_below_filter?

Fam

> After each individual snapshot the linked lists and the hash/arrays would be
> updated to point to the new top bsn.
> The snapshot operation can be done without violating any of the top block
> filter BlockDriverState.
> 
> What do you think of this idea ?
> How this would fit in block.c/blockdev.c ?
> 
> Best regards
> 
> Benoît



reply via email to

[Prev in Thread] Current Thread [Next in Thread]