Re: TCG performance on PPC64

qemu-ppc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TCG performance on PPC64

From:	Matheus K. Ferst
Subject:	Re: TCG performance on PPC64
Date:	Thu, 26 May 2022 08:07:07 -0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0

On 19/05/2022 01:13, David Gibson wrote:
>> What would be different in aarch64 emulation that yields a better
>> performance on our POWER9?

>> - I suppose that aarch64 has more instructions with GVecimplementations

>> than PPC64 and s390x, so maybe aarch64 guests can better use host-vector
>> instructions?
>
> As with Richard, I think it's pretty unlikely that this would make
> such a difference.  With a pure number crunching vector workload in
> the guest, maybe, with kernel & userspace boot, not really.  It might
> be interesting to configure a guest CPU without vector support to
> double check if it makes any differece though.
>
>>  - Looking at the flame graphs of each test (attached), I can see that

>> tb_gen_code takes proportionally less time of aarch64 emulation thanPPC64

>> and s390x, so it might be that decodetree is faster?

>> - There is more than TCG at play, so perhaps the differences can bebetter

>> explained by VirtIO performance or something else?
>
> Also seems unlikely to me; I don't really see how this would differ
> enough based on guest type to make the difference we see here.
>

>> Currently, Leandro Lupori is working to improve TLB invalidation[7],Victor>> Colombo is working to enable hardfpu in some scenarios, and I'mreviewing

>> some older helpers that can use GVec or easily implemented inline. We're

>> also planning to add some Power ISA v3.1 instructions to the TCGbackend,>> but it's probably better to test on hardware if our changes aredoing any

>> good, and we don't have access to a POWER10 yet.
>>
>> Are there any other known performance problems for TCG on PPC64 that we
>> should investigate?
>
> Known?  I don't think so.  The TCG code is pretty old and clunky
> though, so there could be all manner of problems lurking in there.
>
>
> A couple of thougts:
>
>  * I wonder how much emulation of guest side synchronization
>    instructions might be a factor here.  That's one of the few things
>    I can think of where the matchup between host and guest models
>    might make a difference.

That's an interesting suggestion, we'll be looking into this. It seemssimilar to Nicholas Piggin's recent works, and there is probably more tobe done in this area.


>  It might be interesting to try these
>    tests with single core guests.  Likewise it might be interesting to
>    get results with multi-core guests, but MTTCG explicitly disabled.
>

With 50 runs:

+---------+--------------------------------+
|         |              Host              |
| Options +---------------+----------------+
|         |     PPC64     |     x86_64     |
+---------+---------------+----------------+
| -smp 2  | 427.41 ± 7.89 |  350.89 ± 7.62 |
| -smp 1  | 574.01 ± 4.18 | 411.27 ± 17.14 |
| No MTTCG| 588.84 ± 8.50 | 445.30 ± 21.66 |
+---------+---------------+----------------+

The gap with x86 has increased in the two new cases, but I'm not sure ifI can draw anything from this result. Maybe it's just SMT vs.Hyper-Thread that benefits POWER9 in the initial test, or the Xeon isbetter at boosting a single core when QEMU uses only one thread.


>  * It might also be interesting to get CPU time results as well as
>    elapsed time.  That might indicate whether qemu is doing more
>    actual work in the slow cases, or if it's blocking for some
>    non-obvious reason.

The results above and in my first email were wall clock time, but I alsohave user and system times on a GitHub wiki page:https://github.com/PPC64/qemu/wiki/TCG-Performance-on-PPC64


Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

[Prev in Thread]

Current Thread

[Next in Thread]

TCG performance on PPC64, Matheus K. Ferst, 2022/05/18
- Re: TCG performance on PPC64, Daniel Henrique Barboza, 2022/05/18
- Re: TCG performance on PPC64, Cédric Le Goater, 2022/05/18
  - Re: TCG performance on PPC64, Matheus K. Ferst, 2022/05/19
- Re: TCG performance on PPC64, Mark Cave-Ayland, 2022/05/18
  - Re: TCG performance on PPC64, Richard Henderson, 2022/05/18
  - Re: TCG performance on PPC64, Matheus K. Ferst, 2022/05/23
- Re: TCG performance on PPC64, Richard Henderson, 2022/05/18
  - Re: TCG performance on PPC64, Matheus K. Ferst, 2022/05/19
- Re: TCG performance on PPC64, David Gibson, 2022/05/19
  - Re: TCG performance on PPC64, Matheus K. Ferst <=
    - Re: TCG performance on PPC64, David Gibson, 2022/05/30

Prev by Date: [PATCH RESEND v3 8/8] target/ppc: Implemented vector module quadword
Next by Date: Re: [PULL v2 75/86] include/hw/pci/pcie_host: Correct PCIE_MMCFG_SIZE_MAX
Previous by thread: Re: TCG performance on PPC64
Next by thread: Re: TCG performance on PPC64
Index(es):
- Date
- Thread