Hi all,
Today I've been playing around with qemu trying to understand how the
emulation works. I've tried some debug flags and looked at log files.
This is how I believe the translation between x86 opcodes and micro
operations is performed today, please correct me if I am wrong:
gen_intermediate_code_internal() in target-i386/translate.c is used to
build intermediate code. The function disas_insn() is used to convert
each opcode into several micro operations. When the block is finished,
the function optimize_flags() is used to optimize away some flag
related
micro operations.
After looking at some log files I wonder if it would be possible to
reduce the number of micro operations (especially the ones involved in
flag handling) by analyzing resources used and set by each x86
instruction and then feed that information into the code that converts
x86 opcodes into micro operations.
Have a look at the following example:
----------------
IN:
0x300a8b99: pop ebx
0x300a8b9a: add ebx
0x300a8ba0: mov DWORD PTR [ebp-684],eax
0x300a8ba6: xor edx,edx
0x300a8ba8: lea eax,[ebp-528]
0x300a8bae: mov esi,esi
0x300a8bb0: inc edx
0x300a8bb1: mov DWORD PTR [eax]
0x300a8bb7: add eax
0x300a8bba: cmp edx
0x300a8bbd: jbe 0x300a8bb0
If we analyze the x86 instructions and keep track of resources first,
instead of generating the micro operations directly, we would come up
with a table containing resource information related to each x86
instruction. This table contains data about required resources and
resources that will be set by each instruction.
The table could also quite easily be extended to contain flags that
mark
if resources are constant or not which leads to further optimization
possibilities later.
instruction | resources required | resources set
pop ebx | ESP | EBX
add ebx,0x11927 | EBX | EBX OF SF ZF AF PF CF
mov ..ebp-684],eax | EBP EAX | IO
xor edx,edx | EDX | EDX OF SF ZF AF PF CF
lea eax,[ebp-528] | EBP | EAX
mov esi,esi | ESI | ESI
inc edx | EDX | EDX OF SF ZF AF PF
mov ..[eax], 0 | EAX | IO
add eax, 4 | EAX | EAX OF SF ZF AF PF CF
cmp edx, 0x4a | EDX | OF SF ZF AF PF CF
jbe .. | EIP CF ZF | EIP
Then we perform a optimization step. This step removes resources marked
as set that are redundant. Maybe the code for this step could be shared
by many target processors, think of it as some kind of generic resource
optimizer.
After optimization:
instruction | resources required | resources set
pop ebx | ESP | EBX
add ebx,0x11927 | EBX | EBX
mov ..ebp-684],eax | EBP EAX | IO
xor edx,edx | EDX | EDX
lea eax,[ebp-528] | EBP | EAX
mov esi,esi | ESI | ESI
inc edx | EDX | EDX
mov ..[eax], 0 | EAX | IO
add eax, 4 | EAX | EAX
cmp edx, 0x4a | EDX | OF SF ZF AF PF CF
jbe .. | EIP CF ZF | EIP
Several flag-related resources have been removed above. No other
registers have been removed, but that would also be possible. The
information left in the table is fed into the code that translates the
x86 opcodes into micro operations and it is up to that code to generate
as few micro operations as possible.
I guess what I am trying to say is that it would be cool to add a
generic optimization step before the opcode to micro operations
translation. But would it be useful? Or just slow?
Any thoughts? Maybe the flag handling code is fast enough today?
/ magnus
_______________________________________________
Qemu-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/qemu-devel