[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v8 0/3] tcg: enhance code generation quality for
From: |
Blue Swirl |
Subject: |
Re: [Qemu-devel] [PATCH v8 0/3] tcg: enhance code generation quality for qemu_ld/st IRs |
Date: |
Sat, 3 Nov 2012 12:52:05 +0000 |
On Wed, Oct 31, 2012 at 7:04 AM, Yeongkyoon Lee
<address@hidden> wrote:
> Here is the 8th version of the series optimizing TCG qemu_ld/st code
> generation.
Thanks, applied all.
>
> v8:
> - Rebase
>
> v7:
> - Rebase and fix mistyping
>
> v6:
> - Remove an extra argument of return addr from MMU helpers
> Instead, embed the fast path addr to the slow path for helpers to use it
> - Change some bitwise operations to bitfields of structure
> - Change the name of function which handles finalization of TB code
> generation
>
> v5:
> - Remove RFC tag
>
> v4:
> - Remove CONFIG_SOFTMMU pre-condition from configure
> - Instead, add some CONFIG_SOFTMMU condition to TCG sources
> - Remove some unnecessary comments
>
> v3:
> - Support CONFIG_TCG_PASS_AREG0
> (expected to get more performance enhancement than others)
> - Remove the configure option "--enable-ldst-optimization""
> - Make the optimization as default on i386 and x86_64 hosts
> - Fix some mistyping and apply checkpatch.pl before committing
> - Test i386, arm and sparc softmmu targets on i386 and x86_64 hosts
> - Test linux-user-test-0.3
>
> v2:
> - Follow the submit rule of qemu
>
> v1:
> - Initial commit request
>
> I think the generated codes from qemu_ld/st IRs are relatively heavy, which
> are
> up to 12 instructions for TLB hit case on i386 host.
> This patch series enhance the code quality of TCG qemu_ld/st IRs by reducing
> jump and enhancing locality.
> Main idea is simple and has been already described in the comments in
> tcg-target.c, which separates slow path (TLB miss case), and generates it at
> the
> end of TB.
>
> For example, the generated code from qemu_ld changes as follow.
> Before:
> (1) TLB check
> (2) If hit fall through, else jump to TLB miss case (5)
> (3) TLB hit case: Load value from host memory
> (4) Jump to next code (6)
> (5) TLB miss case: call MMU helper
> (6) ... (next code)
>
> After:
> (1) TLB check
> (2) If hit fall through, else jump to TLB miss case (5)
> (3) TLB hit case: Load value from host memory
> (4) ... (next code)
> ...
> (5) TLB miss case: call MMU helper
> (6) Jump to (8)
> (7) [embedded addr of (4)] <- never executed but read by MMU helpers
> (8) Return to next code (4)
>
> Following is some performance results measured based on qemu 1.0.
> Although there was measurement error, the results was not negligible.
>
> * EEMBC CoreMark (before -> after)
> - Guest: i386, Linux (Tizen platform)
> - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
> - Results: 1135.6 -> 1179.9 (+3.9%)
>
> * nbench (before -> after)
> - Guest: i386, Linux (linux-0.2.img included in QEMU source)
> - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
> - Results
> . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%)
> . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%)
> . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%)
>
> Summarized features:
> - The changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and
> they are enabled by default on i386/x86_64 hosts
> - Forced removal of the macro will cause compilation error on i386/x86_64
> hosts
> - No implementations other than i386/x86_64 hosts yet
>
> In addition, I have tried to remove the generated codes of calling MMU helpers
> for TLB miss case from end of TB, however, have not found good solution yet.
> In my opinion, TLB hit case performance could be degraded if removing the
> calling codes, because it needs to set runtime parameters, such as, data,
> mmu index and return address, in register or stack though they are not used
> in TLB hit case.
> This remains as a further issue.
>
> Yeongkyoon Lee (3):
> configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
> optimization
> tcg: Add extended GETPC mechanism for MMU helpers with ldst
> optimization
> tcg: Optimize qemu_ld/st by generating slow paths at the end of a
> block
>
> configure | 6 +
> exec-all.h | 36 +++++
> exec.c | 11 ++
> softmmu_template.h | 16 +-
> tcg/i386/tcg-target.c | 404
> ++++++++++++++++++++++++++++++++++---------------
> tcg/tcg.c | 12 ++
> tcg/tcg.h | 30 ++++
> 7 files changed, 381 insertions(+), 134 deletions(-)
>
> --
> 1.7.9.5
>