[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 03/12] block: Filtered children access functi
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PATCH v3 03/12] block: Filtered children access functions |
Date: |
Wed, 13 Feb 2019 20:25:27 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 |
On 2/13/19 4:53 PM, Max Reitz wrote:
> What bs->file and bs->backing mean depends on the node. For filter
> nodes, both signify a node that will eventually receive all R/W
> accesses. For format nodes, bs->file contains metadata and data, and
> bs->backing will not receive writes -- instead, writes are COWed to
> bs->file. Usually.
>
> In any case, it is not trivial to guess what a child means exactly with
> our currently limited form of expression. It is better to introduce
> some functions that actually guarantee a meaning:
>
> - bdrv_filtered_cow_child() will return the child that receives requests
> filtered through COW. That is, reads may or may not be forwarded
> (depending on the overlay's allocation status), but writes never go to
> this child.
>
> - bdrv_filtered_rw_child() will return the child that receives requests
> filtered through some very plain process. Reads and writes issued to
> the parent will go to the child as well (although timing, etc. may be
> modified).
>
> - All drivers but quorum (but quorum is pretty opaque to the general
> block layer anyway) always only have one of these children: All read
> requests must be served from the filtered_rw_child (if it exists), so
> if there was a filtered_cow_child in addition, it would not receive
> any requests at all.
> (The closest here is mirror, where all requests are passed on to the
> source, but with write-blocking, write requests are "COWed" to the
> target. But that just means that the target is a special child that
> cannot be introspected by the generic block layer functions, and that
> source is a filtered_rw_child.)
> Therefore, we can also add bdrv_filtered_child() which returns that
> one child (or NULL, if there is no filtered child).
>
> Also, many places in the current block layer should be skipping filters
> (all filters or just the ones added implicitly, it depends) when going
> through a block node chain. They do not do that currently, but this
> patch makes them.
>
> One example for this is qemu-img map, which should skip filters and only
> look at the COW elements in the graph. The change to iotest 204's
> reference output shows how using blkdebug on top of a COW node used to
> make qemu-img map disregard the rest of the backing chain, but with this
> patch, the allocation in the base image is reported correctly.
>
> Furthermore, a note should be made that sometimes we do want to access
> bs->backing directly. This is whenever the operation in question is not
> about accessing the COW child, but the "backing" child, be it COW or
> not. This is the case in functions such as bdrv_open_backing_file() or
> whenever we have to deal with the special behavior of @backing as a
> blockdev option, which is that it does not default to null like all
> other child references do.
>
> Finally, the query functions (query-block and query-named-block-nodes)
> are modified to return any filtered child under "backing", not just
> bs->backing or COW children. This is so that filters do not interrupt
> the reported backing chain. This changes the output of iotest 184, as
> the throttled node now appears as a backing child.
>
> Signed-off-by: Max Reitz <address@hidden>
> ---
> +++ b/qapi/block-core.json
> @@ -2417,6 +2417,10 @@
> # On successful completion the image file is updated to drop the backing file
> # and the BLOCK_JOB_COMPLETED event is emitted.
> #
> +# In case @device is a filter node, block-stream modifies the first
> non-filter
> +# overlay node below it to point to base's backing node (or NULL if @base was
> +# not specified) instead of modifying @device itself.
Maybe s/In case/If/
> +/*
> + * For a backing chain, return the first non-filter backing image.
> + */
> +BlockDriverState *bdrv_backing_chain_next(BlockDriverState *bs)
> +{
> + return
> bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)));
Quite a mouthful, but looks correct.
> +++ b/block/io.c
> @@ -118,8 +118,17 @@ static void bdrv_merge_limits(BlockLimits *dst, const
> BlockLimits *src)
> void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
> {
> BlockDriver *drv = bs->drv;
> + BlockDriverState *storage_bs;
> + BlockDriverState *cow_bs = bdrv_filtered_cow_bs(bs);
If the backing file is filtered by a blkdebug layer that intentionally
is trying to advertise alternative block sizes...
> Error *local_err = NULL;
>
> + /*
> + * FIXME: There should be a function for this, and in fact there
> + * will be as of a follow-up patch.
> + */
> + storage_bs =
> + child_bs(bs->file) ?: bdrv_filtered_rw_bs(bs);
> +
> memset(&bs->bl, 0, sizeof(bs->bl));
>
> if (!drv) {
> @@ -131,13 +140,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error
> **errp)
> drv->bdrv_aio_preadv) ? 1 : 512;
>
> /* Take some limits from the children as a default */
> - if (bs->file) {
> - bdrv_refresh_limits(bs->file->bs, &local_err);
> + if (storage_bs) {
> + bdrv_refresh_limits(storage_bs, &local_err);
> if (local_err) {
> error_propagate(errp, local_err);
> return;
> }
> - bdrv_merge_limits(&bs->bl, &bs->file->bs->bl);
> + bdrv_merge_limits(&bs->bl, &storage_bs->bl);
> } else {
> bs->bl.min_mem_alignment = 512;
> bs->bl.opt_mem_alignment = getpagesize();
> @@ -146,13 +155,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error
> **errp)
> bs->bl.max_iov = IOV_MAX;
> }
>
> - if (bs->backing) {
> - bdrv_refresh_limits(bs->backing->bs, &local_err);
> + if (cow_bs) {
> + bdrv_refresh_limits(cow_bs, &local_err);
...then this change means that the active layer no longer picks up the
blkdebug block sizes, but the original COW layer sizes. Is that
intentional? I don't think it is fatal to the patch (as blkdebug is not
used in production, but only in testing), but it may cause some
head-scratching when trying to test behaviors of a COW child with
different block sizes than the active layer by using blkdebug on top of
the COW child. I guess I'll find out soon enough (on my todo list is
fixing NBD to never split an NBD_CMD_READ or NBD_CMD_BLOCK_STATUS reply
below the granularity advertised at the active layer, even if the
backing file has a smaller granularity - and using blkdebug to force the
backing file granularity was my original plan of attack - but it is not
until this patch is applied that NBD can even locate the bitmap in a
backing file when a filter is interposed)
Reviewed-by: Eric Blake <address@hidden>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature
- [Qemu-devel] [PATCH v3 00/12] block: Deal with filters, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 08/12] block: Leave BDS.backing_file constant, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 07/12] iotests: Add tests for mirror @replaces loops, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 06/12] block: Fix check_to_replace_node(), Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 04/12] block: Storage child access function, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 05/12] block: Inline bdrv_co_block_status_from_*(), Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 03/12] block: Filtered children access functions, Max Reitz, 2019/02/13
- Re: [Qemu-devel] [PATCH v3 03/12] block: Filtered children access functions,
Eric Blake <=
- [Qemu-devel] [PATCH v3 02/12] blockdev: Check @replaces in blockdev_mirror_common, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 01/12] block: Mark commit and mirror as filter drivers, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 12/12] iotests: Test committing to overridden backing, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 10/12] iotests: Add filter mirror test cases, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 09/12] iotests: Add filter commit test cases, Max Reitz, 2019/02/13
- [Qemu-devel] [PATCH v3 11/12] iotests: Add test for commit in sub directory, Max Reitz, 2019/02/13