[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functi
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions |
Date: |
Fri, 24 May 2019 08:12:50 +0000 |
23.05.2019 20:27, Max Reitz wrote:
> On 17.05.19 16:50, Vladimir Sementsov-Ogievskiy wrote:
>> 10.04.2019 23:20, Max Reitz wrote:
>>> What bs->file and bs->backing mean depends on the node. For filter
>>> nodes, both signify a node that will eventually receive all R/W
>>> accesses. For format nodes, bs->file contains metadata and data, and
>>> bs->backing will not receive writes -- instead, writes are COWed to
>>> bs->file. Usually.
>>>
>>> In any case, it is not trivial to guess what a child means exactly with
>>> our currently limited form of expression. It is better to introduce
>>> some functions that actually guarantee a meaning:
>>>
>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>> filtered through COW. That is, reads may or may not be forwarded
>>> (depending on the overlay's allocation status), but writes never go to
>>> this child.
>>>
>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>> filtered through some very plain process. Reads and writes issued to
>>> the parent will go to the child as well (although timing, etc. may be
>>> modified).
>>>
>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>> block layer anyway) always only have one of these children: All read
>>> requests must be served from the filtered_rw_child (if it exists), so
>>> if there was a filtered_cow_child in addition, it would not receive
>>> any requests at all.
>>> (The closest here is mirror, where all requests are passed on to the
>>> source, but with write-blocking, write requests are "COWed" to the
>>> target. But that just means that the target is a special child that
>>> cannot be introspected by the generic block layer functions, and that
>>> source is a filtered_rw_child.)
>>> Therefore, we can also add bdrv_filtered_child() which returns that
>>> one child (or NULL, if there is no filtered child).
>>>
>>> Also, many places in the current block layer should be skipping filters
>>> (all filters or just the ones added implicitly, it depends) when going
>>> through a block node chain. They do not do that currently, but this
>>> patch makes them.
>>>
>>> One example for this is qemu-img map, which should skip filters and only
>>> look at the COW elements in the graph. The change to iotest 204's
>>> reference output shows how using blkdebug on top of a COW node used to
>>> make qemu-img map disregard the rest of the backing chain, but with this
>>> patch, the allocation in the base image is reported correctly.
>>>
>>> Furthermore, a note should be made that sometimes we do want to access
>>> bs->backing directly. This is whenever the operation in question is not
>>> about accessing the COW child, but the "backing" child, be it COW or
>>> not. This is the case in functions such as bdrv_open_backing_file() or
>>> whenever we have to deal with the special behavior of @backing as a
>>> blockdev option, which is that it does not default to null like all
>>> other child references do.
>>>
>>> Finally, the query functions (query-block and query-named-block-nodes)
>>> are modified to return any filtered child under "backing", not just
>>> bs->backing or COW children. This is so that filters do not interrupt
>>> the reported backing chain. This changes the output of iotest 184, as
>>> the throttled node now appears as a backing child.
>>>
>>> Signed-off-by: Max Reitz <address@hidden>
>>> ---
>>
>> [..]
>>
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>
> [...]
>
>>> @@ -1650,7 +1651,9 @@ static void mirror_start_job(const char *job_id,
>>> BlockDriverState *bs,
>>> * any jobs in them must be blocked */
>>> if (target_is_backing) {
>>> BlockDriverState *iter;
>>> - for (iter = backing_bs(bs); iter != target; iter =
>>> backing_bs(iter)) {
>>> + for (iter = bdrv_filtered_bs(bs); iter != target;
>>
>> should it be filtered_target too?
>
> Hmm... The comment says that all nodes that disappear must be blocked.
> I don’t even know by heart which nodes I let disappear. :-/
>
> I suppose we should start at the first explicit node, filter or not...?
Hm, I thought about where should we stop. But I don't think we want to remove
nodes
under target, so it should be OK as is..
>
>>> + iter = bdrv_filtered_bs(iter))
>>> + {
>>> /* XXX BLK_PERM_WRITE needs to be allowed so we don't block
>>> * ourselves at s->base (if writes are blocked for a node,
>>> they are
>>> * also blocked for its backing file). The other options
>>> would be a
>
> [...]
>
>>> @@ -1707,14 +1710,14 @@ void mirror_start(const char *job_id,
>>> BlockDriverState *bs,
>>> MirrorCopyMode copy_mode, Error **errp)
>>> {
>>> bool is_none_mode;
>>> - BlockDriverState *base;
>>> + BlockDriverState *base = NULL;
>>
>> dead assignment
>
> Now I wonder why I even have that. Probably an artifact from some
> intermediate point.
>
>>>
>>> if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>>> error_setg(errp, "Sync mode 'incremental' not supported");
>>> return;
>>> }
>>> is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
>>> - base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
>>> + base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) :
>>> NULL;
>>> mirror_start_job(job_id, bs, creation_flags, target, replaces,
>>> speed, granularity, buf_size, backing_mode,
>>> on_source_error, on_target_error, unmap, NULL, NULL,
>>> diff --git a/block/qapi.c b/block/qapi.c
>>> index 110d05dc57..478c6f5e0d 100644
>>> --- a/block/qapi.c
>>> +++ b/block/qapi.c
>
> [...]
>
>>> @@ -535,9 +538,10 @@ static BlockStats
>>> *bdrv_query_bds_stats(BlockDriverState *bs,
>>> s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
>>> }
>>>
>>> - if (blk_level && bs->backing) {
>>> + cow_bs = bdrv_filtered_cow_bs(bs);
>>
>> So, if we at blk_level and top bs is explicit filter, you don't want to show
>> it's
>> child?
>
> I do. It’s in s->parent. I thought it makes sense to change the
> existing bs->file vs. bs->backing to storage vs. COW.
Hmm. When I reviewed this I didn't consider the following patch. Actually, in
this patch
you break showing backing child for filter, and in following - fix it. Not very
good,
but not a reason to merge these two patches.. Ok for me.
>
>> Hmm, at least, we can't show it if it is file-child, as qapi filed already
>> called
>> backing. So, if we can't show for file-child-based filters, it may be better
>> to not
>> show filter children here at all.
>>
>>> + if (blk_level && cow_bs) {
>>> s->has_backing = true;
>>> - s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
>>> + s->backing = bdrv_query_bds_stats(cow_bs, blk_level);
>>> }
>>>
>>> return s;
>>> diff --git a/block/stream.c b/block/stream.c
>>> index bfaebb861a..23d5c890e0 100644
>>> --- a/block/stream.c
>>> +++ b/block/stream.c
>>> @@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
>>> StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>>> BlockJob *bjob = &s->common;
>>> BlockDriverState *bs = blk_bs(bjob->blk);
>>> + BlockDriverState *unfiltered = bdrv_skip_rw_filters(bs);
>>
>> Aha, I'd call it filtered, but unfiltered is correct too, it's amazing
>
> Haha :-)
>
> I think it’s all rather insane than amazing, but, well, insanity never
> ceases to amaze, does it.
>
>>> BlockDriverState *base = s->base;
>>> Error *local_err = NULL;
>>> int ret = 0;
>>> @@ -72,7 +73,7 @@ static int stream_prepare(Job *job)
>
> [...]
>
>>> @@ -121,7 +122,7 @@ static int coroutine_fn stream_run(Job *job, Error
>>> **errp)
>>> int64_t n = 0; /* bytes */
>>> void *buf;
>>>
>>> - if (!bs->backing) {
>>> + if (!bdrv_filtered_child(bs)) {
>>> goto out;
>>> }
>>
>> this condition checks that there is nothing to stream, so, I thing it's
>> better to check
>> if (!bdrv_backing_chain_next(bs)) {
>> goto out;
>> }
>
> Ah, sure.
>
>>> @@ -162,7 +163,7 @@ static int coroutine_fn stream_run(Job *job, Error
>>> **errp)
>>> } else if (ret >= 0) {
>>> /* Copy if allocated in the intermediate images. Limit to
>>> the
>>> * known-unallocated area [offset,
>>> offset+n*BDRV_SECTOR_SIZE). */
>>> - ret = bdrv_is_allocated_above(backing_bs(bs), base,
>>> + ret = bdrv_is_allocated_above(bdrv_filtered_bs(bs), base,
>>> offset, n, &n);
>>
>> Hmm, if we trying to support bs to be filter, and actually operate on
>> first-non-filter,
>> as you write in qapi spec, this is wrong. Again it should be
>> bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs))..
>
> Would bdrv_backing_chain_next() fulfill the same purpose? It can’t be
> allocated in a filter node, after all.
It's OK too.
>
>> Or, may be better, we at stream start should calculate reald top bs to
>> operate on, and
>> forget about all filters above.. i.e., do bs = bdrv_skip_rw_filters(bs) at
>> the very
>> beginning, when creating a job.
>
> Sounds reasonable. We can ignore all the filters on top of the
> (un)filtered top anyway.
>
>>> /* Finish early if end of backing file has been reached */
>>> @@ -268,7 +269,9 @@ void stream_start(const char *job_id, BlockDriverState
>>> *bs,
>>> * disappear from the chain after this operation. The streaming job
>>> reads
>>> * every block only once, assuming that it doesn't change, so block
>>> writes
>>> * and resizes. */
>>> - for (iter = backing_bs(bs); iter && iter != base; iter =
>>> backing_bs(iter)) {
>>> + for (iter = bdrv_filtered_bs(bs); iter && iter != base;
>>> + iter = bdrv_filtered_bs(iter))
>>> + {
>>> block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
>>> BLK_PERM_CONSISTENT_READ |
>>> BLK_PERM_WRITE_UNCHANGED,
>>> &error_abort);
>>> diff --git a/blockdev.c b/blockdev.c
>>> index 4775a07d93..bb71b8368d 100644
>>> --- a/blockdev.c
>>> +++ b/blockdev.c
>>> @@ -1094,7 +1094,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
>>> return;
>>> }
>>>
>>> - bs = blk_bs(blk);
>>> + bs = bdrv_skip_implicit_filters(blk_bs(blk));
>>> aio_context = bdrv_get_aio_context(bs);
>>> aio_context_acquire(aio_context);
>>>
>>> @@ -1663,7 +1663,7 @@ static void external_snapshot_prepare(BlkActionState
>>> *common,
>>> goto out;
>>> }
>>>
>>> - if (state->new_bs->backing != NULL) {
>>> + if (bdrv_filtered_cow_child(state->new_bs)) {
>>
>> Do we allow to create filter snapshot? We should either restrict it
>> explicitly or
>> check bdrv_filtered_child here.. And we can't allow file-based-filters
>> anyway..
>
> Hm, yes, we should probably check both (separately to give better error
> messages).
>
> In theory it might be possible to allow filters on top, but there isn’t
> really any point. If someone wants to add filters on top of the
> snapshot, they should use reopen.
>
>> [skipped up to the end of blockdev.c, I'm tired o_O]
>
> I can very much relate. :-)
>
> Your review definitely is much appreciated.
>
>>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>>> index d1bb863cb6..f99f753fba 100644
>>> --- a/migration/block-dirty-bitmap.c
>>> +++ b/migration/block-dirty-bitmap.c
>>> @@ -285,9 +285,7 @@ static int init_dirty_bitmap_migration(void)
>>> const char *drive_name = bdrv_get_device_or_node_name(bs);
>>>
>>> /* skip automatically inserted nodes */
>>> - while (bs && bs->drv && bs->implicit) {
>>> - bs = backing_bs(bs);
>>> - }
>>> + bs = bdrv_skip_implicit_filters(bs);
>>
>> this intersects with Jonh's patch
>> [PATCH v2] migration/dirty-bitmaps: change bitmap enumeration method
>> https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03340.html
>
> Well. I’m not really considerate of other patches with this series.
> Rebasing is always such a pain that I just write it for the current
> master. I won’t incorporate unmerged series because doing so may cause
> me to have to rebase more than once.
>
> And I can’t get this series merged soon enough because it’s just wrong
> that I (and you) have to the one(s) thinking about how to treat filters
> everywhere. It should be the people that introduce the code.
I think it's just impossible to think over all filter use-cases now. Better
to stop at some point, and then fix wrong things while covering real use-cases
by io-tests.
>
>>> for (bitmap = bdrv_dirty_bitmap_next(bs, NULL); bitmap;
>>> bitmap = bdrv_dirty_bitmap_next(bs, bitmap))
>>> diff --git a/nbd/server.c b/nbd/server.c
>>> index e21bd501dc..e41ae89dbe 100644
>>> --- a/nbd/server.c
>>> +++ b/nbd/server.c
>>> @@ -1506,13 +1506,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs,
>>> uint64_t dev_offset,
>>> if (bitmap) {
>>> BdrvDirtyBitmap *bm = NULL;
>>>
>>> - while (true) {
>>> + while (bs) {
>>> bm = bdrv_find_dirty_bitmap(bs, bitmap);
>>> - if (bm != NULL || bs->backing == NULL) {
>>> + if (bm != NULL) {
>>> break;
>>> }
>>>
>>> - bs = bs->backing->bs;
>>> + bs = bdrv_filtered_bs(bs);
>>> }
>>
>> Check in documentation: "@bitmap: Also export the dirty bitmap reachable
>> from @device".
>>
>> "Reachable" is not bad, but we may want to clarify that extended backing
>> chain is meant
>
> Hm... Isn’t that just a problem with the current documentation?
>
Yes. Anyway, it's not necessary to fix it in theses series.
> I think this change in code better fits what I’d guess from “reachable”
> than what it currently means.
>
>>> if (bm == NULL) {
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index aa6f81f1ea..bcfbb743fc 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
>
> [...]
>
>>> @@ -2434,7 +2433,8 @@ static int img_convert(int argc, char **argv)
>>> * s.target_backing_sectors has to be negative, which it will
>>> * be automatically). The backing file length is used only
>>> * for optimizations, so such a case is not fatal. */
>>> - s.target_backing_sectors = bdrv_nb_sectors(out_bs->backing->bs);
>>> + s.target_backing_sectors =
>>> + bdrv_nb_sectors(bdrv_filtered_cow_bs(out_bs));
>>
>> can't out_bs be filter itself?
>
> why would you do that
>
> More serious, well, perhaps, in theory. In practice I really cannot
> imagine why it would be.
Throttling? But it has file child and will not work with backing anyway. And
we don't have public backing-based filters anyway. So, I don't care, it's OK.
>
>>
>>> } else {
>>> s.target_backing_sectors = -1;
>>> }
>>> @@ -2797,6 +2797,7 @@ static int get_block_status(BlockDriverState *bs,
>>> int64_t offset,
>>>
>>> depth = 0;
>>> for (;;) {
>>> + bs = bdrv_skip_rw_filters(bs);
>>
>> Why? Filters may have own implementation of block_status, why to skip it?
>>
>> Or, thay cannot? Really, may be disallow filters have block_status, we may
>> solve
>> inefficient block_status_above we talked about before.
>
> As said in the other subthread, I think ignoring filters here is fine.
>
> Max
>
>>> ret = bdrv_block_status(bs, offset, bytes, &bytes, &map, &file);
>>> if (ret < 0) {
>>> return ret;
>
--
Best regards,
Vladimir
- Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions, (continued)
Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions, Vladimir Sementsov-Ogievskiy, 2019/05/17
Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions, Max Reitz, 2019/05/31