[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster siz
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity |
Date: |
Fri, 18 Jan 2013 17:22:48 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
Il 18/01/2013 16:13, Kevin Wolf ha scritto:
> Am 16.01.2013 18:31, schrieb Paolo Bonzini:
>> When mirroring runs, the backing files for the target may not yet be
>> ready. However, this means that a copy-on-write operation on the target
>> would fill the missing sectors with zeros. Copy-on-write only happens
>> if the granularity of the dirty bitmap is smaller than the cluster size
>> (and only for clusters that are allocated in the source after the job
>> has started copying). So far, the granularity was fixed to 1MB; to avoid
>> the problem we detected the situation and required the backing files to
>> be available in that case only.
>>
>> However, we want to lower the granularity for efficiency, so we need
>> a better solution. The solution is to always copy a whole cluster the
>> first time it is touched. The code keeps a bitmap of clusters that
>> have already been allocated by the mirroring job, and only does "manual"
>> copy-on-write if the chunk being copied is zero in the bitmap.
>>
>> Signed-off-by: Paolo Bonzini <address@hidden>
>> ---
>> v1->v2: rebased for moved include files
>>
>> block/mirror.c | 60
>> +++++++++++++++++++++++++++++++++++++------
>> blockdev.c | 15 ++---------
>> tests/qemu-iotests/041 | 21 +++++++++++++++
>> tests/qemu-iotests/041.out | 4 +-
>> trace-events | 1 +
>> 5 files changed, 78 insertions(+), 23 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 20cb1e7..ee45e2e 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -15,6 +15,7 @@
>> #include "block/blockjob.h"
>> #include "block/block_int.h"
>> #include "qemu/ratelimit.h"
>> +#include "qemu/bitmap.h"
>>
>> enum {
>> /*
>> @@ -36,6 +37,8 @@ typedef struct MirrorBlockJob {
>> bool synced;
>> bool should_complete;
>> int64_t sector_num;
>> + size_t buf_size;
>> + unsigned long *cow_bitmap;
>> HBitmapIter hbi;
>> uint8_t *buf;
>> } MirrorBlockJob;
>> @@ -60,7 +63,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s,
>> BlockDriverState *target = s->target;
>> QEMUIOVector qiov;
>> int ret, nb_sectors;
>> - int64_t end;
>> + int64_t end, sector_num, cluster_num;
>> struct iovec iov;
>>
>> s->sector_num = hbitmap_iter_next(&s->hbi);
>> @@ -71,22 +74,41 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob
>> *s,
>> assert(s->sector_num >= 0);
>> }
>>
>> + /* If we have no backing file yet in the destination, and the cluster
>> size
>> + * is very large, we need to do COW ourselves. The first time a
>> cluster is
>> + * copied, copy it entirely.
>> + *
>> + * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are
>> + * powers of two, the number of sectors to copy cannot exceed one
>> cluster.
>> + */
>> + sector_num = s->sector_num;
>> + nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>> + cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
>> + if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) {
>> + trace_mirror_cow(s, sector_num);
>> + bdrv_round_to_clusters(s->target,
>> + sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK,
>> + §or_num, &nb_sectors);
>> + bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK,
>> + nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK);
>
> Here the bit in the cow_bitmap is set before the COW has actually been
> performed. It could still fail.
>
>> + }
>> +
>> end = s->common.len >> BDRV_SECTOR_BITS;
>> - nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num);
>> - bdrv_reset_dirty(source, s->sector_num, nb_sectors);
>> + nb_sectors = MIN(nb_sectors, end - sector_num);
>> + bdrv_reset_dirty(source, sector_num, nb_sectors);
>>
>> /* Copy the dirty cluster. */
>> iov.iov_base = s->buf;
>> iov.iov_len = nb_sectors * 512;
>> qemu_iovec_init_external(&qiov, &iov, 1);
>>
>> - trace_mirror_one_iteration(s, s->sector_num, nb_sectors);
>> - ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov);
>> + trace_mirror_one_iteration(s, sector_num, nb_sectors);
>> + ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov);
>> if (ret < 0) {
>> *p_action = mirror_error_action(s, true, -ret);
>> goto fail;
>> }
>> - ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov);
>> + ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov);
>> if (ret < 0) {
>> *p_action = mirror_error_action(s, false, -ret);
>> s->synced = false;
>> @@ -96,7 +118,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob
>> *s,
>>
>> fail:
>> /* Try again later. */
>> - bdrv_set_dirty(source, s->sector_num, nb_sectors);
>> + bdrv_set_dirty(source, sector_num, nb_sectors);
>
> If it does, we mark the whole cluster dirty now, but in the cow_bitmap
> it's still marked at present on the target. When restarting the job,
> wouldn't it copy only the start of the cluster next time and corrupt the
> rest of it?
Yes, very good catch.
I think this should fix it.
diff --git a/block/mirror.c b/block/mirror.c
index 82abc2f..0fc140a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -87,6 +87,9 @@ static void mirror_iteration_done(MirrorOp *op)
cluster_num = op->sector_num / s->granularity;
nb_chunks = op->nb_sectors / s->granularity;
bitmap_clear(s->in_flight_bitmap, cluster_num, nb_chunks);
+ if (s->cow_bitmap) {
+ bitmap_set(s->cow_bitmap, cluster_num, nb_chunks);
+ }
trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors);
g_slice_free(MirrorOp, op);
@@ -217,9 +220,6 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
/* We have enough free space to copy these sectors. */
bitmap_set(s->in_flight_bitmap, next_cluster, added_chunks);
- if (s->cow_bitmap) {
- bitmap_set(s->cow_bitmap, next_cluster, added_chunks);
- }
nb_sectors += added_sectors;
nb_chunks += added_chunks;
I haven't written a testcase for it, it's tricky but should be doable.
Do you want me to respin, or can it be done as a followup?
I would prefer a followup also because it will give a better pointer when
we backport this fix to the RHEL6 code.
Paolo
- [Qemu-devel] [PATCH v2 00/12] Drive mirroring performance improvements, Paolo Bonzini, 2013/01/16
- [Qemu-devel] [PATCH v2 01/12] host-utils: add ffsl, Paolo Bonzini, 2013/01/16
- [Qemu-devel] [PATCH v2 03/12] block: implement dirty bitmap using HBitmap, Paolo Bonzini, 2013/01/16
- [Qemu-devel] [PATCH v2 02/12] add hierarchical bitmap data type and test cases, Paolo Bonzini, 2013/01/16
- [Qemu-devel] [PATCH v2 04/12] block: make round_to_clusters public, Paolo Bonzini, 2013/01/16
- [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity, Paolo Bonzini, 2013/01/16
- Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity, Kevin Wolf, 2013/01/18
- Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity,
Paolo Bonzini <=
- Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity, Kevin Wolf, 2013/01/18
- Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity, Paolo Bonzini, 2013/01/18
- Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity, Kevin Wolf, 2013/01/21
- Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity, Paolo Bonzini, 2013/01/21
- Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity, Stefan Hajnoczi, 2013/01/22
- Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity, Paolo Bonzini, 2013/01/21
[Qemu-devel] [PATCH v2 07/12] block: allow customizing the granularity of the dirty bitmap, Paolo Bonzini, 2013/01/16
[Qemu-devel] [PATCH v2 08/12] mirror: allow customizing the granularity, Paolo Bonzini, 2013/01/16