[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing framewor

From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH v1 0/6] A migration performance testing framework
Date: Thu, 5 May 2016 16:39:45 +0100
User-agent: Mutt/1.6.0 (2016-04-01)

* Daniel P. Berrange (address@hidden) wrote:
> This series of patches provides a framework for testing migration performance
> characteristics. The motivating factor for this is planning that is underway
> in OpenStack wrt making use of QEMU migration features such as compression,
> auto-converge and post-copy. The primary aim for OpenStack is to have Nova
> autonomously manage migration features & tunables to maximise chances that
> migration will complete. The problem faced is figuring out just which QEMU
> migration features are "best" suited to our needs. This means we want data
> on how well they are able to ensure completion of a migration, against the
> host resources used and the impact on the guest workload performance.
> The test framework produced here takes a pathelogical guest workload (every
> CPU just burning 100% of time xor'ing every byte of guest memory with random
> data). This is quite a pessimistic test because most guest workloads are not
> giong to be this heavy on memory writes, and their data won't be uniformly
> random and so will be able to compress better than this test does.
> With this worst case guest, I have produced a set of tests using UNIX socket,
> TCP localhost, TCP remote and RDMA remote socket transports, with both a
> 1 GB RAM + 1 CPU guest and a 8 GB RAM + 4 CPU guest.
> The TCP/RDMA remote host tests were run over a 10-GiG-E network interface.
> I have put the results online to view here:
>   https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/
> The charts here are showing two core sets of data:
>  - The guest CPU performance. The left axis is showing the time in 
> milliseconds
>    required to xor 1 GB of memory. This is shown per-guest CPU and combined 
> all
>    CPUs.
>  - The host CPU utilization. The right axis is showing the overall QEMU 
> process
>    CPU utilization, and the per-VCPU utilization.
> Note that the charts are interactive - you can turn on/off each plot line and
> zoom in by selecting regions on the chart.
> Some interesting things that I have observed with this
>  - At the start of each iteration of migration there is a distinct drop in
>    guest CPU performance as shown by a spike in the guest CPU time lines.
>    Performance would drop from 200ms/GB to 400ms/GB. Presumably this is
>    related to QEMU recalculating the dirty bitmap for the guest RAM. See
>    the spikes in the green line in:
> https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/post-copy-bandwidth/post-copy-bw-1gbs.html

Yeh, that doesn't surprise me too much.

>  - For the larger sized guests, the auto-converge code has to throttle the
>    guest to as much as 90% or more before it is able to meet the 500ms max
>    downtime value
> https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/auto-converge-bandwidth/auto-converge-bw-1gbs.html
>    Even then I often saw tests aborting as they hit the max number of
>    iterations I permitted (30 iters max)
> https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/auto-converge-bandwidth/auto-converge-bw-10gbs.html

It doesn't take much CPU to dirty memory, so you need an awful lot of 
and the throttling is non-discriminate so it throttles threads that are dirtying
memory a lot as well as those that aren't.

>  - MT compression is actively harmful to chances of successful migration when
>    the guest RAM is not compression friendly. My work load is worst case since
>    it is splattering RAM with totally random bytes. The MT compression is
>    dramatically increasing the time for each iteration as we bottleneck on CPU
>    compression speed, leaving the network largely idle. This causes migration
>    which would have completed without compression, to fail. It also burns huge
>    amounts of host CPU time
> https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-mt/compr-mt-threads-4.html

Yes; I think the hope is that this will work well with compression accelerator
hardware.  I look forward to the vendors of that hardware using your scripts
to produce comparisons.

>  - XBZRLE compression did not have as much of a CPU peformance penalty on the
>    host as MT comprssion, but also did not help migration to actually 
> complete.
>    Again this is largely due to the workload being the worst case scenario 
> with
>    random bytes. The downside is obviously the potentially significant memory
>    overhead on the host due to the cache sizing
> https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-xbzrle/compr-xbzrle-cache-50.html
>  - Post-copy, by its very nature, obviously ensured that the migraton would
>    complete. While post-copy was running in pre-copy mode there was a somewhat
>    chaotic small impact on guest CPU performance, causing performance to
>    periodically oscillate between 400ms/GB and 800ms/GB. This is less than
>    the impact at the start of each migration iteration which was 1000ms/GB
>    in this test. There was also a massive penalty at time of switchover from
>    pre to post copy, as to be expected. The migration completed in post-copy
>    phase quite quickly though. For this workload, number of iterations in
>    pre-copy mode before switching to post-copy did not have much impact. I
>    expect a less extreme workload would have shown more interesting results
>    wrt number of iterations of pre-copy:
> https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html

Hmm; I hadn't actually expected that much performance difference during the
precopy phase (it used to in earlier postcopy versions but the later versions
should have got simpler).  The number of iterations wouldn't make that much 
for your workload - because you're changing all of memory then we're going to 
have to
resend it; if you had a workload where some of the memory was mostly static
and some was rapidly changing, then one or two passes to transfer the mostly
static data would show a benefit.

> Overall, if we're looking for a solution that can guarantee completion under
> the most extreme guest workload, then only post-copy & autoconverge appear
> upto the job.
> The MT compression is seriously harmful to migration and has severe CPU
> overhead. The XBZRLE compression is moderatly harmful to migration and has
> potentilly severa memory overhead for large cache sizes to make it useful.
> While auto-converge can ensure that guest migration completes, it has a
> pretty significantly long term impact on guest CPU performance to achieve
> this. ie the guest spends a long time in pre-copy mode with its CPUs very
> dramatically throttled down. The level of throttling required makes one
> wonder whether it is worth using, against simply pausing the guest workload.
> The latter has a hard blackout period, but over a quite short time frame
> if network speed is fast.
> The post-copy code does have an impact on guest performance while in pre
> copy mode, vs a plain migration. It also has a fairly high spike when in
> post-copy mode, but this last for a pretty short time. As compared to
> auto-converge, it is able to ensure the migration completes in a finite
> time without having a prolonged impact on guest CPU performance. The
> penalty during the post-copy phase is on a par with the penalty impose
> by auto-converge when it has to throttle to 90%+.

One advantage of postcopy here is that it should be more selective on
real workloads than auto-converge.  If some threads have the data they
need on the destination already, they can run without much performance impact.
(Indeed that's actually threads, not even vCPUs, since the asynchronous
page faults let the guest keep the vCPU busy even if one
thread is waiting).

> Overall, in the context of a worst case guest workload, it appears that
> post-copy is the clear winning strategy ensuring completion of migration
> without imposing an long duration penalty on guest peformance. If the
> risk of failure from post-copy is unacceptable then auto-converge is a
> good fallback option, if the long duration guest CPU penalty can be
> accepted.

Excellent :-) And we have a GSoC student looking at recovery after network
failure during postcopy - but as you see the actual postcopy phase is
pretty short anyway, so the risk window is short.

> The compression options are only worth using if the host has free CPU
> resources, and the guest RAM is believed to be compression friendly,
> as they steal significant CPU time away from guests in order to run
> compression, often with a negative impact on migration completion
> chances.
> Looking at migration in general, even with a 10-GiG-E NIC and RDMA
> transport it is possible for a single guest to provide a workload that
> will saturate the network during migration & thus prevent completion.
> Based on this, there is little point in attempting to run migrations
> in parallel on the same host, unless multiple NICs are available,
> as parallel migrations would reduce the chances of either one ever
> completing. Better reliability & faster overall completion would
> likely be achieved by fully serializing migration operations per
> host.

Probably true; however:
   a) 10Gb is a bit slow these days - you can get dual port 100Gb cards
     even so you'll never keep up with a CPU bus going flat

   b) If your bandwidth is limited by single core CPU (e.g. encryption/ or
      just TCP speed) then two migrations might make sense if you
      still have spare cores and bandwidth.

   c) If the destination of the two migrations are different hosts then
      hmm the discussion probably gets more complex :-)

> There is clearly scope for more investigation here, in particular
>  - Produce some alternative guest workloads that try to present
>    a more "average" scenario workload, instead of the worst-case.
>    These would likely allow compression to have some positive
>    impact.
>  - Try various combinations of strategies. For example, combining
>    post-copy and auto-converge at the same time, or compression
>    combined with either post-copy or auto-converge.
>  - Investigate block migration performance too, with NBD migration
>    server.
>  - Investigate effect of dynamically changing max downtime value
>    during migration, rather than using a fixed 500ms value.

Yes, and probably worth trying to figure out what's happening during
the precopy phase of postcopy.


> Daniel P. Berrange (6):
>   scripts: add __init__.py file to scripts/qmp/
>   scripts: add a 'debug' parameter to QEMUMonitorProtocol
>   scripts: refactor the VM class in iotests for reuse
>   scripts: set timeout when waiting for qemu monitor connection
>   scripts: ensure monitor socket has SO_REUSEADDR set
>   tests: introduce a framework for testing migration performance
>  configure                               |   2 +
>  scripts/qemu.py                         | 202 +++++++++++
>  scripts/qmp/__init__.py                 |   0
>  scripts/qmp/qmp.py                      |  15 +-
>  scripts/qtest.py                        |  34 ++
>  tests/Makefile                          |  12 +
>  tests/migration/.gitignore              |   2 +
>  tests/migration/guestperf-batch.py      |  26 ++
>  tests/migration/guestperf-plot.py       |  26 ++
>  tests/migration/guestperf.py            |  27 ++
>  tests/migration/guestperf/__init__.py   |   0
>  tests/migration/guestperf/comparison.py | 124 +++++++
>  tests/migration/guestperf/engine.py     | 439 ++++++++++++++++++++++
>  tests/migration/guestperf/hardware.py   |  62 ++++
>  tests/migration/guestperf/plot.py       | 623 
> ++++++++++++++++++++++++++++++++
>  tests/migration/guestperf/progress.py   | 117 ++++++
>  tests/migration/guestperf/report.py     |  98 +++++
>  tests/migration/guestperf/scenario.py   |  95 +++++
>  tests/migration/guestperf/shell.py      | 255 +++++++++++++
>  tests/migration/guestperf/timings.py    |  55 +++
>  tests/migration/stress.c                | 367 +++++++++++++++++++
>  tests/qemu-iotests/iotests.py           | 135 +------
>  22 files changed, 2583 insertions(+), 133 deletions(-)
>  create mode 100644 scripts/qemu.py
>  create mode 100644 scripts/qmp/__init__.py
>  create mode 100644 tests/migration/.gitignore
>  create mode 100755 tests/migration/guestperf-batch.py
>  create mode 100755 tests/migration/guestperf-plot.py
>  create mode 100755 tests/migration/guestperf.py
>  create mode 100644 tests/migration/guestperf/__init__.py
>  create mode 100644 tests/migration/guestperf/comparison.py
>  create mode 100644 tests/migration/guestperf/engine.py
>  create mode 100644 tests/migration/guestperf/hardware.py
>  create mode 100644 tests/migration/guestperf/plot.py
>  create mode 100644 tests/migration/guestperf/progress.py
>  create mode 100644 tests/migration/guestperf/report.py
>  create mode 100644 tests/migration/guestperf/scenario.py
>  create mode 100644 tests/migration/guestperf/shell.py
>  create mode 100644 tests/migration/guestperf/timings.py
>  create mode 100644 tests/migration/stress.c
> -- 
> 2.5.5
Dr. David Alan Gilbert / address@hidden / Manchester, UK

reply via email to

[Prev in Thread] Current Thread [Next in Thread]