qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to spe


From: Aurelien Jarno
Subject: Re: [Qemu-devel] Debian 7.8.0 SPARC64 on qemu - anything i can do to speedup the emulation?
Date: Wed, 29 Jul 2015 17:13:09 +0200
User-agent: Mutt/1.5.23 (2014-03-12)

On 2015-07-29 15:45, Karel Gardas wrote:
> On Wed, Jul 29, 2015 at 12:20 PM, Dennis Luehring <address@hidden> wrote:
> > Am 29.07.2015 um 11:17 schrieb Karel Gardas:
> >>
> >> If
> >> anybody is interested I can dig those old emails.
> >
> >
> > would be nice
> 
> Here is speed comparison:
> https://lists.debian.org/debian-sparc/2015/02/msg00001.html but whole
> thread started in january here:
> https://lists.debian.org/debian-sparc/2015/01/msg00000.html
> 
> Mark then asked for profiles, I see I send them privately due to
> attachements, the email is:
> 
> off-list as I'm attaching files which may be too bit for list. Also
> I'm not sure if this is still relevant to debian-sparc@
> 
> Anyway, difference in IO is negligible. When I compile on SPARC on
> tmpfs it was still 6m40s. On SPARC it's using -drive while on AArch64
> it uses all the virtio optimization probably.
> 
> Anyway, with gprof you've hit the point. Attached two files (text
> output from gprof). One shows profiler as a reference, just
> boot/login/su root/poweroff/kill qemu and another is the same but ~5
> hours of compilation of nbench2 in shell loop.
> 
> reference shows:
>    %  cumulative    self              self    total
>  time   seconds   seconds    calls  ms/call  ms/call name
>  42.9     145.84   145.84                            cpu_sparc_exec [1]
>   7.8     172.44    26.60                            tcg_optimize [2]

tcg_optimize should be improved by the patchset I posted.

>   4.4     187.38    14.94                            tcg_reg_alloc_op [3]
>   4.4     202.20    14.82 get_physical_address_data [4]
>   3.8     215.26    13.06 tcg_liveness_analysis [5]
> 
> 
> while compile loop shows:
>    %  cumulative    self              self    total
>  time   seconds   seconds    calls  ms/call  ms/call name
>  21.2    1008.09  1008.09                            tlb_flush_page [1]
>  15.2    1731.09   723.00                            cpu_sparc_exec [2]
>  13.6    2374.79   643.70                            tb_flush_jmp_cache [3]
>   9.5    2823.86   449.07                            tcg_optimize [4]
>   4.2    3024.26   200.40 tcg_liveness_analysis [5]
> 
> 
> that's indeed a difference. -- I assume cpu_sparc_exec is what does
> actual work here...

Depending on how your profiling is done or not it might not. It might be
that the time spent in cpu_sparc_exec is just the time needed to look
for the translated code in the TB cache.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
address@hidden                 http://www.aurel32.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]