qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [Qemu-devel] Profiling results


From: Peter Maydell
Subject: Re: [Qemu-ppc] [Qemu-devel] Profiling results
Date: Tue, 17 Jul 2018 22:53:51 +0100

On 17 July 2018 at 21:46, BALATON Zoltan <address@hidden> wrote:
> On Tue, 17 Jul 2018, Mark Cave-Ayland wrote:
>> Good question. A quick grep for 'asidx_from_attrs' shows that
>> cc->asidx_from_attrs() isn't set for PPC targets, so as a quick test does
>> replacing the inline function cpu_asidx_from_attrs() in include/qom/cpu.h
>> with a simple "return 0" change the profile at all?
>
>
> It does seem to lessen its impact but it's still higher than I expected:

It may be worth special-casing the CPU method lookups (or at
least that one) if we can, then...

> %        cum. %     linenr info                 symbol name
> 10.7949  10.7949    exec-all.h:410              helper_lookup_tb_ptr
>  7.8663  18.6612    cputlb.c:793                io_readx
>  6.0265  24.6878    cputlb.c:114                tlb_flush_nocheck
>  4.0671  28.7548    sm501_template.h:62         draw_line16_32
>  4.0559  32.8107    object.c:765
> object_class_dynamic_cast_assert
>  3.3780  36.1887    memory.c:1350               memory_region_access_valid
>  2.8920  39.0808    qemu-thread-posix.c:61      qemu_mutex_lock_impl
>  2.7187  41.7995    memory.c:1415               memory_region_dispatch_read
>  2.6011  44.4006    qht.c:487                   qht_lookup_custom
>  2.5356  46.9362    softmmu_template.h:112      helper_ret_ldub_mmu
>
> Maybe it's called from somewhere else too? I know draw_line16_32 but I
> wonder where could helper_lookup_tb_ptr and tlb flushes come from? Those
> seem to be significant. And io_readx in itself seems to be too high on the
> list too.

helper_lookup_tb_ptr is part of TCG -- it's where we look for
the next TB to go to. Any non-computed branch to a different page
will result in our calling this. So it's high on the profile
because we do it a lot, I think, but that's not necessarily a
problem as such.

io_readx is the slow path for guest memory accesses -- any
guest access to something that's not RAM will have to go through
here. My first guess (given the other things in the profile,
especially helper_ret_ldub_mmu, memory_region_dispatch_read
and memory_region_access_valid) is that the guest is in a tight
loop doing a read on a device register a lot of the time.

> I wonder if it may have something to do with the background task
> trying to read non-implemented i2c stuff frequently (as discussed in point
> 2. in http://zero.eik.bme.hu/~balaton/qemu/amiga/#morphos).

Could be, or some similar thing. If you suspect the i2c you
could try putting in an unimplemented-device stub in the
right place and see how often -d unimp yells about reads to it.

So overall I'd be a little wary of optimizing based on this
profile, because I suspect it's atypical -- the guest is sat
in a tight polling loop and the profile says "all the functions
in the code path for doing device access are really hot".
The fix is to improve our model so the guest doesn't get
stuck like that, not to try to slightly improve the speed
of device accesses (we call it the "slow path" for a reason :-))

(But places like asidx_from_attrs are likely to be on hot
paths in general, so having the QOM class lookup there be
overly heavyweight is maybe worth fixing anyhow.)

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]