[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: starting to look at qemu savevm performance, a first regression dete
From: |
Claudio Fontana |
Subject: |
Re: starting to look at qemu savevm performance, a first regression detected |
Date: |
Mon, 7 Mar 2022 13:07:14 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 |
On 3/7/22 12:31 PM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 3/7/22 11:32 AM, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfontana@suse.de) wrote:
>>>> On 3/5/22 2:20 PM, Claudio Fontana wrote:
>>>>>
>>>>> Hello all,
>>>>>
>>>>> I have been looking at some reports of bad qemu savevm performance in
>>>>> large VMs (around 20+ Gb),
>>>>> when used in libvirt commands like:
>>>>>
>>>>>
>>>>> virsh save domain /dev/null
>>>>>
>>>>>
>>>>>
>>>>> I have written a simple test to run in a Linux centos7-minimal-2009
>>>>> guest, which allocates and touches 20G mem.
>>>>>
>>>>> With any qemu version since around 2020, I am not seeing more than 580
>>>>> Mb/Sec even in the most ideal of situations.
>>>>>
>>>>> This drops to around 122 Mb/sec after commit:
>>>>> cbde7be900d2a2279cbc4becb91d1ddd6a014def .
>>>>>
>>>>> Here is the bisection for this particular drop in throughput:
>>>>>
>>>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad)
>>>>> Author: Daniel P. Berrangé <berrange@redhat.com>
>>>>> Date: Fri Feb 19 18:40:12 2021 +0000
>>>>>
>>>>> migrate: remove QMP/HMP commands for speed, downtime and cache size
>>>>>
>>>>> The generic 'migrate_set_parameters' command handle all types of
>>>>> param.
>>>>>
>>>>> Only the QMP commands were documented in the deprecations page, but
>>>>> the
>>>>> rationale for deprecating applies equally to HMP, and the replacements
>>>>> exist. Furthermore the HMP commands are just shims to the QMP
>>>>> commands,
>>>>> so removing the latter breaks the former unless they get
>>>>> re-implemented.
>>>>>
>>>>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>>>
>>>>>
>>>>> git bisect start
>>>>> # bad: [5c8463886d50eeb0337bd121ab877cf692731e36] Merge remote-tracking
>>>>> branch 'remotes/kraxel/tags/kraxel-20220304-pull-request' into staging
>>>>> git bisect bad 5c8463886d50eeb0337bd121ab877cf692731e36
>>>>> # good: [6cdf8c4efa073eac7d5f9894329e2d07743c2955] Update version for
>>>>> 4.2.1 release
>>>>> git bisect good 6cdf8c4efa073eac7d5f9894329e2d07743c2955
>>>>> # good: [b0ca999a43a22b38158a222233d3f5881648bb4f] Update version for
>>>>> v4.2.0 release
>>>>> git bisect good b0ca999a43a22b38158a222233d3f5881648bb4f
>>>>> # skip: [e2665f314d80d7edbfe7f8275abed7e2c93c0ddc] target/mips: Alias MSA
>>>>> vector registers on FPU scalar registers
>>>>> git bisect skip e2665f314d80d7edbfe7f8275abed7e2c93c0ddc
>>>>> # good: [4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3] tests/docker: Install
>>>>> static libc package in CentOS 7
>>>>> git bisect good 4762c82cbda22b1036ce9dd2c5e951ac0ed0a7d3
>>>>> # bad: [d4127349e316b5c78645f95dba5922196ac4cc23] Merge remote-tracking
>>>>> branch 'remotes/berrange-gitlab/tags/crypto-and-more-pull-request' into
>>>>> staging
>>>>> git bisect bad d4127349e316b5c78645f95dba5922196ac4cc23
>>>>> # bad: [d90f154867ec0ec22fd719164b88716e8fd48672] Merge remote-tracking
>>>>> branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210504' into staging
>>>>> git bisect bad d90f154867ec0ec22fd719164b88716e8fd48672
>>>>> # good: [dd5af6ece9b101d29895851a7441d848b7ccdbff] tests/docker: add a
>>>>> test-tcg for building then running check-tcg
>>>>> git bisect good dd5af6ece9b101d29895851a7441d848b7ccdbff
>>>>> # bad: [90ec1cff768fcbe1fa2870d2018f378376f4f744] target/riscv: Adjust
>>>>> privilege level for HLV(X)/HSV instructions
>>>>> git bisect bad 90ec1cff768fcbe1fa2870d2018f378376f4f744
>>>>> # good: [373969507a3dc7de2d291da7e1bd03acf46ec643] migration: Replaced
>>>>> qemu_mutex_lock calls with QEMU_LOCK_GUARD
>>>>> git bisect good 373969507a3dc7de2d291da7e1bd03acf46ec643
>>>>> # good: [4083904bc9fe5da580f7ca397b1e828fbc322732] Merge remote-tracking
>>>>> branch 'remotes/rth-gitlab/tags/pull-tcg-20210317' into staging
>>>>> git bisect good 4083904bc9fe5da580f7ca397b1e828fbc322732
>>>>> # bad: [009ff89328b1da3ea8ba316bf2be2125bc9937c5] vl: allow passing JSON
>>>>> to -object
>>>>> git bisect bad 009ff89328b1da3ea8ba316bf2be2125bc9937c5
>>>>> # bad: [50243407457a9fb0ed17b9a9ba9fc9aee09495b1] qapi/qom: Drop
>>>>> deprecated 'props' from object-add
>>>>> git bisect bad 50243407457a9fb0ed17b9a9ba9fc9aee09495b1
>>>>> # bad: [1b507e55f8199eaad99744613823f6929e4d57c6] Merge remote-tracking
>>>>> branch 'remotes/berrange-gitlab/tags/dep-many-pull-request' into staging
>>>>> git bisect bad 1b507e55f8199eaad99744613823f6929e4d57c6
>>>>> # bad: [24e13a4dc1eb1630eceffc7ab334145d902e763d] chardev: reject use of
>>>>> 'wait' flag for socket client chardevs
>>>>> git bisect bad 24e13a4dc1eb1630eceffc7ab334145d902e763d
>>>>> # good: [8becb36063fb14df1e3ae4916215667e2cb65fa2] monitor: remove
>>>>> 'query-events' QMP command
>>>>> git bisect good 8becb36063fb14df1e3ae4916215667e2cb65fa2
>>>>> # bad: [8af54b9172ff3b9bbdbb3191ed84994d275a0d81] machine: remove
>>>>> 'query-cpus' QMP command
>>>>> git bisect bad 8af54b9172ff3b9bbdbb3191ed84994d275a0d81
>>>>> # bad: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate: remove QMP/HMP
>>>>> commands for speed, downtime and cache size
>>>>> git bisect bad cbde7be900d2a2279cbc4becb91d1ddd6a014def
>>>>> # first bad commit: [cbde7be900d2a2279cbc4becb91d1ddd6a014def] migrate:
>>>>> remove QMP/HMP commands for speed, downtime and cache size
>>>>>
>>>>>
>>>>> Are there some obvious settings / options I am missing to regain the
>>>>> savevm performance after this commit?
>>>>
>>>> Answering myself:
>>>
>>> <oops we seem to have split this thread into two>
>>>
>>>> this seems to be due to a resulting different default xbzrle cache size
>>>> (probably interactions between libvirt/qemu versions?).
>>>>
>>>> When forcing the xbzrle cache size to a larger value, the performance is
>>>> back.
>>>
>>> That's weird that 'virsh save' is ending up using xbzrle.
>>
>> virsh save (or qemu savevm..) seems to me like it uses a subset of the
>> migration code and migration parameters but not all..
>>
>>>
>>>>>
>>>>> I have seen projects attempting to improve other aspects of performance
>>>>> (snapshot performance, etc), is there something going on to improve the
>>>>> transfer of RAM in savevm too?
>>>>
>>>>
>>>> Still I would think that we should be able to do better than 600ish Mb/s ,
>>>> any ideas, prior work on this,
>>>> to improve savevm performance, especially looking at RAM regions transfer
>>>> speed?
>>>
>>> My normal feeling is ~10Gbps for a live migrate over the wire; I rarely
>>> try virsh save though.
>>> If you're using xbzrle that might explain it; it's known to eat cpu -
>>> but I'd never expect it to have been used with 'virsh save'.
>>
>> some valgrind shows it among the top cpu eaters;
well.. I was confused.
The usage of xbzrle is just on constantly calling migrate_use_xbzrle() and
XBZRLE_cache_lock and XBZRLE_cache_unlock() as well as some
xbzrle_cache_zero_page(),
which likely do not do anything useful, as ->ram_bulk_stage is not changed by
anything so it should be true.
>>
>> I wonder why we are able to do more than 2x better for actual live
>> migration, compared with virsh save /dev/null ...
>
> What speed do you get if you force xbzrle off?
no substantial difference.
>
> Dave
>
>> Thanks,
>>
>> Claudio
>>
- Re: starting to look at qemu savevm performance, a first regression detected, Claudio Fontana, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected, Daniel P . Berrangé, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected, Claudio Fontana, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected, Daniel P . Berrangé, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected, Claudio Fontana, 2022/03/07
- Re: starting to look at qemu savevm performance, a first regression detected, Dr. David Alan Gilbert, 2022/03/07
- bad qemu savevm to /dev/null performance (600 MiB/s max) (Was: Re: starting to look at qemu savevm performance, a first regression detected), Claudio Fontana, 2022/03/09
- Re: bad qemu savevm to /dev/null performance (600 MiB/s max) (Was: Re: starting to look at qemu savevm performance, a first regression detected), Dr. David Alan Gilbert, 2022/03/09
- Re: bad qemu savevm to /dev/null performance (600 MiB/s max) (Was: Re: starting to look at qemu savevm performance, a first regression detected), Daniel P . Berrangé, 2022/03/09
- Re: bad qemu savevm to /dev/null performance (600 MiB/s max) (Was: Re: starting to look at qemu savevm performance, a first regression detected), Claudio Fontana, 2022/03/09