[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-commits] [qemu/qemu] c878da: tcg/ppc32: Use trampolines to trim th
From: |
GitHub |
Subject: |
[Qemu-commits] [qemu/qemu] c878da: tcg/ppc32: Use trampolines to trim the code size f... |
Date: |
Mon, 05 Nov 2012 17:00:11 -0800 |
Branch: refs/heads/master
Home: https://github.com/qemu/qemu
Commit: c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6
https://github.com/qemu/qemu/commit/c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6
Author: malc <address@hidden>
Date: 2012-11-05 (Mon, 05 Nov 2012)
Changed paths:
M exec-all.h
M tcg/ppc/tcg-target.c
Log Message:
-----------
tcg/ppc32: Use trampolines to trim the code size for mmu slow path accessors
mmu access looks something like:
<check tlb>
if miss goto slow_path
<fast path>
done:
...
; end of the TB
slow_path:
<pre process>
mr r3, r27 ; move areg0 to r3
; (r3 holds the first argument for all the PPC32 ABIs)
<call mmu_helper>
b $+8
.long done
<post process>
b done
On ppc32 <call mmu_helper> is:
(SysV and Darwin)
mmu_helper is most likely not within direct branching distance from
the call site, necessitating
a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes
b. moving GPR to CTR/LR ; 4 bytes
c. (finally) branching to CTR/LR ; 4 bytes
r3 setting - 4 bytes
call - 16 bytes
dummy jump over retaddr - 4 bytes
embedded retaddr - 4 bytes
Total overhead - 28 bytes
(PowerOpen (AIX))
a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes
b. loading 32 bit function pointer into GPR2 ; 4 bytes
c. moving GPR2 to CTR/LR ; 4 bytes
d. loading 32 bit small area pointer into R2 ; 4 bytes
e. (finally) branching to CTR/LR ; 4 bytes
r3 setting - 4 bytes
call - 24 bytes
dummy jump over retaddr - 4 bytes
embedded retaddr - 4 bytes
Total overhead - 36 bytes
Following is done to trim the code size of slow path sections:
In tcg_target_qemu_prologue trampolines are emitted that look like this:
trampoline:
mfspr r3, LR
addi r3, 4
mtspr LR, r3 ; fixup LR to point over embedded retaddr
mr r3, r27
<jump mmu_helper> ; tail call of sorts
And slow path becomes:
slow_path:
<pre process>
<call trampoline>
.long done
<post process>
b done
call - 4 bytes (trampoline is within code gen buffer
and most likely accessible via
direct branch)
embedded retaddr - 4 bytes
Total overhead - 8 bytes
In the end the icache pressure is decreased by 20/28 bytes at the cost
of an extra jump to trampoline and adjusting LR (to skip over embedded
retaddr) once inside.
Signed-off-by: malc <address@hidden>
Commit: 2592c59a66d456fe98fe96cb5787b356c40ee66f
https://github.com/qemu/qemu/commit/2592c59a66d456fe98fe96cb5787b356c40ee66f
Author: Paolo Bonzini <address@hidden>
Date: 2012-11-05 (Mon, 05 Nov 2012)
Changed paths:
M qemu-img.c
M qemu-io.c
Log Message:
-----------
tools: initialize main loop before block layer
Tools were broken because they initialized the block layer while
qemu_aio_context was still NULL.
Reported-by: malc <address@hidden>
Signed-off-by: Paolo Bonzini <address@hidden>
Signed-off-by: malc <address@hidden>
Compare: https://github.com/qemu/qemu/compare/1cfd981ff1e8...2592c59a66d4
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-commits] [qemu/qemu] c878da: tcg/ppc32: Use trampolines to trim the code size f...,
GitHub <=