qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v8 0/3] tcg: enhance code generation quality for


From: Blue Swirl
Subject: Re: [Qemu-devel] [PATCH v8 0/3] tcg: enhance code generation quality for qemu_ld/st IRs
Date: Sat, 3 Nov 2012 12:52:05 +0000

On Wed, Oct 31, 2012 at 7:04 AM, Yeongkyoon Lee
<address@hidden> wrote:
> Here is the 8th version of the series optimizing TCG qemu_ld/st code 
> generation.

Thanks, applied all.

>
> v8:
>  - Rebase
>
> v7:
>   - Rebase and fix mistyping
>
> v6:
>   - Remove an extra argument of return addr from MMU helpers
>     Instead, embed the fast path addr to the slow path for helpers to use it
>   - Change some bitwise operations to bitfields of structure
>   - Change the name of function which handles finalization of TB code 
> generation
>
> v5:
>   - Remove RFC tag
>
> v4:
>   - Remove CONFIG_SOFTMMU pre-condition from configure
>   - Instead, add some CONFIG_SOFTMMU condition to TCG sources
>   - Remove some unnecessary comments
>
> v3:
>   - Support CONFIG_TCG_PASS_AREG0
>     (expected to get more performance enhancement than others)
>   - Remove the configure option "--enable-ldst-optimization""
>   - Make the optimization as default on i386 and x86_64 hosts
>   - Fix some mistyping and apply checkpatch.pl before committing
>   - Test i386, arm and sparc softmmu targets on i386 and x86_64 hosts
>   - Test linux-user-test-0.3
>
> v2:
>   - Follow the submit rule of qemu
>
> v1:
>   - Initial commit request
>
> I think the generated codes from qemu_ld/st IRs are relatively heavy, which 
> are
> up to 12 instructions for TLB hit case on i386 host.
> This patch series enhance the code quality of TCG qemu_ld/st IRs by reducing
> jump and enhancing locality.
> Main idea is simple and has been already described in the comments in
> tcg-target.c, which separates slow path (TLB miss case), and generates it at 
> the
> end of TB.
>
> For example, the generated code from qemu_ld changes as follow.
> Before:
> (1) TLB check
> (2) If hit fall through, else jump to TLB miss case (5)
> (3) TLB hit case: Load value from host memory
> (4) Jump to next code (6)
> (5) TLB miss case: call MMU helper
> (6) ... (next code)
>
> After:
> (1) TLB check
> (2) If hit fall through, else jump to TLB miss case (5)
> (3) TLB hit case: Load value from host memory
> (4) ... (next code)
> ...
> (5) TLB miss case: call MMU helper
> (6) Jump to (8)
> (7) [embedded addr of (4)] <- never executed but read by MMU helpers
> (8) Return to next code (4)
>
> Following is some performance results measured based on qemu 1.0.
> Although there was measurement error, the results was not negligible.
>
> * EEMBC CoreMark (before -> after)
>   - Guest: i386, Linux (Tizen platform)
>   - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
>   - Results: 1135.6 -> 1179.9 (+3.9%)
>
> * nbench (before -> after)
>   - Guest: i386, Linux (linux-0.2.img included in QEMU source)
>   - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
>   - Results
>     . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%)
>     . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%)
>     . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%)
>
> Summarized features:
>  - The changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and
>    they are enabled by default on i386/x86_64 hosts
>  - Forced removal of the macro will cause compilation error on i386/x86_64 
> hosts
>  - No implementations other than i386/x86_64 hosts yet
>
> In addition, I have tried to remove the generated codes of calling MMU helpers
> for TLB miss case from end of TB, however, have not found good solution yet.
> In my opinion, TLB hit case performance could be degraded if removing the
> calling codes, because it needs to set runtime parameters, such as, data,
> mmu index and return address, in register or stack though they are not used
> in TLB hit case.
> This remains as a further issue.
>
> Yeongkyoon Lee (3):
>   configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
>     optimization
>   tcg: Add extended GETPC mechanism for MMU helpers with ldst
>     optimization
>   tcg: Optimize qemu_ld/st by generating slow paths at the end of a
>     block
>
>  configure             |    6 +
>  exec-all.h            |   36 +++++
>  exec.c                |   11 ++
>  softmmu_template.h    |   16 +-
>  tcg/i386/tcg-target.c |  404 
> ++++++++++++++++++++++++++++++++++---------------
>  tcg/tcg.c             |   12 ++
>  tcg/tcg.h             |   30 ++++
>  7 files changed, 381 insertions(+), 134 deletions(-)
>
> --
> 1.7.9.5
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]