qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers


From: Aurelien Jarno
Subject: Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU
Date: Sun, 29 Mar 2009 16:57:49 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

On Sun, Mar 29, 2009 at 04:42:50PM +0200, Aurelien Jarno wrote:
> On Sun, Mar 29, 2009 at 03:34:53PM +0200, Aurelien Jarno wrote:
> > On Sat, Mar 28, 2009 at 05:18:34PM -0700, Nathan Froyd wrote:
> > > On Sat, Mar 28, 2009 at 11:54:43PM +0100, Aurelien Jarno wrote:
> > > > On Sat, Mar 28, 2009 at 02:30:13PM -0700, Nathan Froyd wrote:
> > > > > I am not a TCG expert, but there are several loops in TCG over all
> > > > > globals and it seems like those loops would go faster if they didn't
> > > > > have to consider registers that would never be touched.  If this patch
> > > > > series makes no difference in TCG's performance, then I'd be glad to
> > > > > have an explanation of why that's the case.
> > > > 
> > > > Do you actually have run a benchmark with those changes? TCG is
> > > > sometimes a bit strange, and some optimizations does not change the
> > > > execution speed, while others improve it a lot. It is very difficult to
> > > > predict what will give a gain or not.
> > > > 
> > > > Suggestions of benchmarks: gzip/bzip2 on a big file using user emulation
> > > > or a compilation in system emulation.
> > > 
> > > Benchmarking?  Pffft. ;)
> > > 
> > > A benchmarking session with qemu-ppc and bzip2/bunzip2 on ~400MB files
> > > and a 603e emulated CPU suggests that these changes are not terribly
> > > beneficial (maybe 1% improvement, if that).  I don't imagine that a
> > > similarly stressful benchmark in system emulation would be much
> > > different.  Consider the patch series withdrawn.
> > > 
> > 
> > I have done a few profiling on qemu-system-ppc and qemu-system-mips. You
> > are actually right that the loop on the TCG variables lists takes time. 
> > This is mainly due to the call of save_globals() for TCG functions marked 
> > as TCG_OPF_CALL_CLOBBER.
> > 
> > However it looks like it should be better to address this comment first
> > before trying to reduce the number of TCG variables:
> > 
> >             /* XXX: for load/store we could do that only for the slow path
> >                (i.e. when a memory callback is called) */
> > 
> 
> Thinking a bit more I think we should avoid mapping FPU registers as
> global TCG variables. Those variables are mostly modified by helpers
> (except for move and load/store), and they will be written back to 
> memory before the call to the helper. This means TCG can't delay the 
> memory accesses, so there is very few (or no) difference in the
> generated code if the FPU register is accessed through a global TCG 
> variable or through tcg_gen_ld_tl().
> 
> I have done the test with qemu-system-mips, and I have found a gain 
> around 1% in speed.
> 

My measurements were wrong, the gain is around 9%.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
address@hidden                 http://www.aurel32.net




reply via email to

[Prev in Thread] Current Thread [Next in Thread]