|
From: | Fabrice Bellard |
Subject: | Re: [Qemu-devel] Re: [PATCH] Updated Sparc support |
Date: | Wed, 14 May 2003 13:48:32 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020828 |
Rusty Russell wrote:
In message <address@hidden> you write:I also plan to add direct block chaining. I will try to make it portable by using the 'goto *' gcc extension, but I don't know yet if it will work on every CPU. The direct block chaining will generate something like:'goto *addr'at the end of some translated blocks to jump either to the CPU core or directly to the next translated block. 'addr' will be a global 'void *' variable. Since no code will be patched to change block chaining, it will simplify the instruction cache invalidation issues and the threading issues.Hmm, I had a more ambitious idea, and that was to keep simple stats on which block last followed each block: if it goes to the same block more than N times in a row, coalesce/chain them. As blocks get longer, you have more opportunities for register lifetime analsis, which could eliminate redundant stores to registers in particular. I haven't got actual code, so I haven't mentioned it before... Thoughts?
It could be interesting to avoid some condition codes computations. Currently it is not possible to do more because qemu has no generic IR and I think I won't have the time to add one. Julian Seward (of the valgrind project) is thinking about adding a more generic IR in valgrind to allow cross debugging, so it might be interesting for valgrind.
BUT, I have a much simpler approach "a la FX!32" which has the advantage of being very simple and which needs very little modification in qemu:
You can launch your executable a first time to record statistics. Then you launch a special tool 'qemuopt' which statically generates a dynamic library with gcc containing the host cpu code of the most used basic block chains.
'qemuopt' is very easy to do : I discovered that by noting that gcc optimizes very well 'static inline' local functions. So you just have to generate a C source containing approximately:
void genfunc(CPUX86State *env) { uint32_t T0, EAX, EBX, ...; EAX = env->regs[R_EAX]; EBX = env->regs[R_EBX]; #define OPPROTO 'static inline' #include "op-i386.c" op_movl_T0_EAX(); op_movl_EBX_T0(); env->regs[R_EAX] = EAX; env->regs[R_EBX] = EBX; } Then gcc does all the hard work for us :-) Fabrice.
[Prev in Thread] | Current Thread | [Next in Thread] |