|
From: | Vladimir Sementsov-Ogievskiy |
Subject: | Re: [Qemu-block] [PATCH v2 2/3] block/fleecing-filter: new filter driver for fleecing |
Date: | Mon, 2 Jul 2018 14:57:48 +0300 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 |
29.06.2018 20:30, John Snow wrote:
On 06/29/2018 11:15 AM, Vladimir Sementsov-Ogievskiy wrote:We need to synchronize backup job with reading from fleecing image like it was done in block/replication.c. Otherwise, the following situation is theoretically possible: 1. client start reading 2. client understand, that there is no corresponding cluster in fleecing imageI don't think the client refocuses the read, but QEMU does. (the client has no idea if it is reading from the fleecing node or the backing file.) ... but understood:3. client is going to read from backing file (i.e. active image) 4. guest writes to active imageMy question here is if QEMU will allow reads and writes to interleave in general. If the client is reading from the backing file, the active image, will QEMU allow a write to modify that data before we're done getting that data? If I understand correctly: yes. Physical drives allows intersecting operations too and there no guarantee on result. We have serializing requests, but in general only unaligned requests are marked serializing. 5. this write is stopped by backup(sync=none) and cluster is copied to fleecing image 6. guest write continues... 7. and client reads _new_ (or partly new) date from active imageI can't answer this for myself one way or the other right now, but I imagine you observed a failure in practice in your test setups, which motivated this patch? No. I just found this code in Qemu: block/replication already use the same scheme with backup(sync=none). And it uses the synchronization as in this patch. I've discussed it on list a bit: http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg01807.html https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg03025.html A test or some proof would help justification for this patch. It would also help prove that it solves what it claims to! Agree, a reproducer is better than any theoretical reflections. I'm thinking about it, but don't see simple solution... So, this fleecing-filter should be above fleecing image, the whole picture of fleecing looks like this: +-------+ +------------+ | | | | | guest | | NBD client +<------+ | | | | | ++-----++ +------------+ |only read | ^ | | IO | | v | +-----+------+ ++-----+---------+ | | | | | internal | | active image +----+ | NBD server | | | | | | +-+--------------+ |backup +-+----------+ ^ |sync=none ^ |backing | |only read | | | +-+--------------+ | +------+----------+ | | | | | | fleecing image +<---+ | fleecing filter | | | | | +--------+-------+ +-----+-----------+ ^ | | | +--------------------------+ file Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden> --- qapi/block-core.json | 6 ++-- block/fleecing-filter.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++ block/Makefile.objs | 1 + 3 files changed, 85 insertions(+), 2 deletions(-) create mode 100644 block/fleecing-filter.c diff --git a/qapi/block-core.json b/qapi/block-core.json index 577ce5e999..43872c3d79 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2542,7 +2542,8 @@ 'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh', - 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] } + 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs', + 'fleecing-filter' ] } ## # @BlockdevOptionsFile: @@ -3594,7 +3595,8 @@ 'vmdk': 'BlockdevOptionsGenericCOWFormat', 'vpc': 'BlockdevOptionsGenericFormat', 'vvfat': 'BlockdevOptionsVVFAT', - 'vxhs': 'BlockdevOptionsVxHS' + 'vxhs': 'BlockdevOptionsVxHS', + 'fleecing-filter': 'BlockdevOptionsGenericFormat' } } ## diff --git a/block/fleecing-filter.c b/block/fleecing-filter.c new file mode 100644 index 0000000000..b501887c10 --- /dev/null +++ b/block/fleecing-filter.c @@ -0,0 +1,80 @@ +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "block/blockjob.h" +#include "block/block_int.h" +#include "block/block_backup.h" + +static int64_t fleecing_getlength(BlockDriverState *bs) +{ + return bdrv_getlength(bs->file->bs); +} + +static coroutine_fn int fleecing_co_preadv(BlockDriverState *bs, + uint64_t offset, uint64_t bytes, + QEMUIOVector *qiov, int flags) +{ + int ret; + BlockJob *job = bs->file->bs->backing->bs->job; + CowRequest req; + + backup_wait_for_overlapping_requests(job, offset, bytes); + backup_cow_request_begin(&req, job, offset, bytes); + + ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags); + + backup_cow_request_end(&req); + + return ret; +} + +static coroutine_fn int fleecing_co_pwritev(BlockDriverState *bs, + uint64_t offset, uint64_t bytes, + QEMUIOVector *qiov, int flags) +{ + return -EINVAL; +} + +static bool fleecing_recurse_is_first_non_filter(BlockDriverState *bs, + BlockDriverState *candidate) +{ + return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate); +} + +static int fleecing_open(BlockDriverState *bs, QDict *options, + int flags, Error **errp) +{ + bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, false, + errp); + + return bs->file ? 0 : -EINVAL; +} + +static void fleecing_close(BlockDriverState *bs) +{ + /* Do nothing, we have to add .bdrv_close, because bdrv_close() don't check + * it, just call. */ +} + +BlockDriver bdrv_fleecing_filter = { + .format_name = "fleecing-filter", + .protocol_name = "fleecing-filter", + .instance_size = 0, + + .bdrv_open = fleecing_open, + .bdrv_close = fleecing_close, + + .bdrv_getlength = fleecing_getlength, + .bdrv_co_preadv = fleecing_co_preadv, + .bdrv_co_pwritev = fleecing_co_pwritev, + + .is_filter = true, + .bdrv_recurse_is_first_non_filter = fleecing_recurse_is_first_non_filter, + .bdrv_child_perm = bdrv_filter_default_perms, +}; + +static void bdrv_fleecing_init(void) +{ + bdrv_register(&bdrv_fleecing_filter); +} + +block_init(bdrv_fleecing_init); diff --git a/block/Makefile.objs b/block/Makefile.objs index 899bfb5e2c..aa0a6dd971 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -27,6 +27,7 @@ block-obj-y += write-threshold.o block-obj-y += backup.o block-obj-$(CONFIG_REPLICATION) += replication.o block-obj-y += throttle.o copy-on-read.o +block-obj-y += fleecing-filter.o block-obj-y += crypto.o -- Best regards, Vladimir |
[Prev in Thread] | Current Thread | [Next in Thread] |