[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-trivial] [Qemu-devel] [PATCH] tcg: optimise memory layout of T
From: |
Alex Bennée |
Subject: |
Re: [Qemu-trivial] [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp |
Date: |
Fri, 27 Mar 2015 09:55:03 +0000 |
Emilio G. Cota <address@hidden> writes:
> This brings down the size of the struct from 56 to 32 bytes on 64-bit,
> and to 16 bytes on 32-bit.
Have you been able to measure any performance improvement with these new
structures? In theory, if aligned with cache lines, performance should
improve but real numbers would be nice.
>
> The appended adds macros to prevent us from mistakenly overflowing
> the bitfields when more elements are added to the corresponding
> enums/macros.
I can see the defines but I can't see any checks. Should we be able to
do a compile time check if TCG_TYPE_COUNT doesn't fit into
TCG_TYPE_NR_BITS?
>
> Note that reg/mem_reg need only 6 bits (for ia64) but for performance
> is probably better to align them to a byte address.
>
> Given that TCGTemp is used in large arrays this leads to a few KBs
> of savings. However, unpacking the bits takes additional code, so
> the net effect depends on the target (host is x86_64):
>
> Before:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o
> 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o
> 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o
> 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o
> 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o
>
> After:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o
> 37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o
> 39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o
> 40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o
> 39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o
>
> Suggested-by: Stefan Weil <address@hidden>
> Suggested-by: Richard Henderson <address@hidden>
> Signed-off-by: Emilio G. Cota <address@hidden>
> ---
> tcg/tcg.h | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..71ae7b2 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,7 @@ typedef struct TCGPool {
> typedef enum TCGType {
> TCG_TYPE_I32,
> TCG_TYPE_I64,
> - TCG_TYPE_COUNT, /* number of different types */
> + TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
>
> /* An alias for the size of the host register. */
> #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +217,9 @@ typedef enum TCGType {
> #endif
> } TCGType;
>
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1
> +
> /* Constants for qemu_ld and qemu_st for the Memory Operation field. */
> typedef enum TCGMemOp {
> MO_8 = 0,
> @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
> #define TEMP_VAL_REG 1
> #define TEMP_VAL_MEM 2
> #define TEMP_VAL_CONST 3
> +#define TEMP_VAL_NR_BITS 2
A similar compile time check could be added here.
>
> -/* XXX: optimize memory layout */
> typedef struct TCGTemp {
> - TCGType base_type;
> - TCGType type;
> - int val_type;
> - int reg;
> - tcg_target_long val;
> - int mem_reg;
> - intptr_t mem_offset;
> + unsigned int reg:8;
> + unsigned int mem_reg:8;
> + unsigned int val_type:TEMP_VAL_NR_BITS;
> + unsigned int base_type:TCG_TYPE_NR_BITS;
> + unsigned int type:TCG_TYPE_NR_BITS;
> unsigned int fixed_reg:1;
> unsigned int mem_coherent:1;
> unsigned int mem_allocated:1;
> @@ -438,6 +439,9 @@ typedef struct TCGTemp {
> basic blocks. Otherwise, it is not
> preserved across basic blocks. */
> unsigned int temp_allocated:1; /* never used for code gen */
> +
> + tcg_target_long val;
> + intptr_t mem_offset;
> const char *name;
> } TCGTemp;
--
Alex Bennée