[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
From: |
Markus Armbruster |
Subject: |
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period |
Date: |
Fri, 04 Aug 2023 20:04:38 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) |
Andrei Gudkov <gudkov.andrei@huawei.com> writes:
> Introduces alternative argument calc-time-ms, which is the
> the same as calc-time but accepts millisecond value.
> Millisecond granularity allows to make predictions whether
> migration will succeed or not. To do this, calculate dirty
> rate with calc-time-ms set to max allowed downtime, convert
> measured rate into volume of dirtied memory, and divide by
> network throughput. If the value is lower than max allowed
> downtime, then migration will converge.
>
> Measurement results for single thread randomly writing to
> a 1/4/24GiB memory region:
>
> +--------------+-----------------------------------------------+
> | calc-time-ms | dirty rate MiB/s |
> | +----------------+---------------+--------------+
> | | theoretical | page-sampling | dirty-bitmap |
> | | (at 3M wr/sec) | | |
> +--------------+----------------+---------------+--------------+
> | 1GiB |
> +--------------+----------------+---------------+--------------+
> | 100 | 6996 | 7100 | 3192 |
> | 200 | 4606 | 4660 | 2655 |
> | 300 | 3305 | 3280 | 2371 |
> | 400 | 2534 | 2525 | 2154 |
> | 500 | 2041 | 2044 | 1871 |
> | 750 | 1365 | 1341 | 1358 |
> | 1000 | 1024 | 1052 | 1025 |
> | 1500 | 683 | 678 | 684 |
> | 2000 | 512 | 507 | 513 |
> +--------------+----------------+---------------+--------------+
> | 4GiB |
> +--------------+----------------+---------------+--------------+
> | 100 | 10232 | 8880 | 4070 |
> | 200 | 8954 | 8049 | 3195 |
> | 300 | 7889 | 7193 | 2881 |
> | 400 | 6996 | 6530 | 2700 |
> | 500 | 6245 | 5772 | 2312 |
> | 750 | 4829 | 4586 | 2465 |
> | 1000 | 3865 | 3780 | 2178 |
> | 1500 | 2694 | 2633 | 2004 |
> | 2000 | 2041 | 2031 | 1789 |
> +--------------+----------------+---------------+--------------+
> | 24GiB |
> +--------------+----------------+---------------+--------------+
> | 100 | 11495 | 8640 | 5597 |
> | 200 | 11226 | 8616 | 3527 |
> | 300 | 10965 | 8386 | 2355 |
> | 400 | 10713 | 8370 | 2179 |
> | 500 | 10469 | 8196 | 2098 |
> | 750 | 9890 | 7885 | 2556 |
> | 1000 | 9354 | 7506 | 2084 |
> | 1500 | 8397 | 6944 | 2075 |
> | 2000 | 7574 | 6402 | 2062 |
> +--------------+----------------+---------------+--------------+
>
> Theoretical values are computed according to the following formula:
> size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> where size is in bytes, time is in seconds, and wps is number of
> writes per second.
>
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> ---
> qapi/migration.json | 14 ++++++--
> migration/dirtyrate.h | 12 ++++---
> migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> 3 files changed, 67 insertions(+), 40 deletions(-)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8843e74b59..82493d6a57 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1849,7 +1849,11 @@
> # @start-time: start time in units of second for calculation
> #
> # @calc-time: time period for which dirty page rate was measured
> -# (in seconds)
> +# (rounded down to seconds).
> +#
> +# @calc-time-ms: actual time period for which dirty page rate was
> +# measured (in milliseconds). Value may be larger than requested
> +# time period due to measurement overhead.
> #
> # @sample-pages: number of sampled pages per GiB of guest memory.
> # Valid only in page-sampling mode (Since 6.1)
> @@ -1866,6 +1870,7 @@
> 'status': 'DirtyRateStatus',
> 'start-time': 'int64',
> 'calc-time': 'int64',
> + 'calc-time-ms': 'int64',
> 'sample-pages': 'uint64',
> 'mode': 'DirtyRateMeasureMode',
> '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> @@ -1908,6 +1913,10 @@
> # dirty during @calc-time period, further writes to this page will
> # not increase dirty page rate anymore.
> #
> +# @calc-time-ms: the same as @calc-time but in milliseconds. These
> +# two arguments are mutually exclusive. Exactly one of them must
> +# be specified. (Since 8.1)
> +#
> # @sample-pages: number of sampled pages per each GiB of guest memory.
> # Default value is 512. For 4KiB guest pages this corresponds to
> # sampling ratio of 0.2%. This argument is used only in page
> @@ -1925,7 +1934,8 @@
> # 'sample-pages': 512} }
> # <- { "return": {} }
> ##
> -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> + '*calc-time-ms': 'int64',
> '*sample-pages': 'int',
> '*mode': 'DirtyRateMeasureMode'} }
>
Having both @calc-time and @calc-time-ms is ugly.
Can we deprecate @calc-time?
I don't like the name @calc-time-ms. We don't put units in names
elsewhere.
Differently ugly: new member containing the fractional part, i.e. time
in seconds = calc-time + fractional-part / 1000. With a better name, of
course.
[...]