Hi
I would like to implement a well known indirect branch optimization named Polymorphic Inline Caching (PIC) in QEMU. PIC relies on software speculation on the likely target of the indirect branch to speed up its dispatch.
Currently, QEMU generates a EOB (end of block) after indirect branches and relies on the runtime to find the next TB. This results in code cache exit/re-entry and TB lookup which can take up a non-trivial amount of time.
PIC mitigates this by using compares and jumps for a few most likely targets to reduce the # of code cache exits as well as TB lookups. An example of PIC is shown below.
*without PIC for indirect branch*
update IA
goto code-cache-epilogue;
lookup TB;
goto code-cache-prologue;
*with PIC for indirect branch:*
update IA;
compare IA with likely target-#1;
jump to TB-target-#1 if match;
compare IA with likely target-#2;
jump to TB-target-#2 if match;
compare IA with likely target-#3;
jump to TB-target-#3 if match;
goto code-cache-epilogue;
lookup TB;
goto code-cache-prologue;
I think target-X/translation.c as well as tcg/X/ need to be changed here. And a new TCG opc needs to be added.
Any comments on how to get started with this are appreciated.
Thanks,
Xin