Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer te

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer te

From:	Richard Henderson
Subject:	Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Date:	Wed, 19 Jul 2017 21:39:35 -1000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:

Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of having a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 2% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
        -machine type=virt -nographic -smp 1 -m 4096 \
        -netdev user,id=unet,hostfwd=tcp::2222-:22 \
        -device virtio-net-device,netdev=unet \
        -drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
        -device virtio-blk-device,drive=myblock \
        -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
        -name arm,debug-threads=on -smp 1' (10 runs):

Before:
       19489.126318 task-clock                #    0.960 CPUs utilized          
  ( +-  0.96% )
             23,697 context-switches          #    0.001 M/sec                  
  ( +-  0.51% )
                  1 CPU-migrations            #    0.000 M/sec
             19,953 page-faults               #    0.001 M/sec                  
  ( +-  0.40% )
     56,214,402,410 cycles                    #    2.884 GHz                    
  ( +-  0.95% ) [83.34%]
     25,516,669,513 stalled-cycles-frontend   #   45.39% frontend cycles idle   
  ( +-  0.69% ) [83.33%]
     17,266,165,747 stalled-cycles-backend    #   30.71% backend  cycles idle   
  ( +-  0.59% ) [66.66%]
     79,007,843,327 instructions              #    1.41  insns per cycle
                                              #    0.32  stalled cycles per 
insn  ( +-  1.19% ) [83.34%]
     13,136,600,416 branches                  #  674.048 M/sec                  
  ( +-  1.29% ) [83.34%]
        274,715,270 branch-misses             #    2.09% of all branches        
  ( +-  0.79% ) [83.33%]

       20.300335944 seconds time elapsed                                        
  ( +-  0.55% )

After:
       19917.737030 task-clock                #    0.955 CPUs utilized          
  ( +-  0.74% )
             23,973 context-switches          #    0.001 M/sec                  
  ( +-  0.37% )
                  1 CPU-migrations            #    0.000 M/sec
             19,824 page-faults               #    0.001 M/sec                  
  ( +-  0.38% )
     57,380,269,537 cycles                    #    2.881 GHz                    
  ( +-  0.70% ) [83.34%]
     26,462,452,508 stalled-cycles-frontend   #   46.12% frontend cycles idle   
  ( +-  0.65% ) [83.34%]
     17,970,546,047 stalled-cycles-backend    #   31.32% backend  cycles idle   
  ( +-  0.64% ) [66.67%]
     79,527,238,334 instructions              #    1.39  insns per cycle
                                              #    0.33  stalled cycles per 
insn  ( +-  0.79% ) [83.33%]
     13,272,362,192 branches                  #  666.359 M/sec                  
  ( +-  0.83% ) [83.34%]
        278,357,773 branch-misses             #    2.10% of all branches        
  ( +-  0.65% ) [83.33%]

       20.850558455 seconds time elapsed                                        
  ( +-  0.55% )

That is, 2.70% slowdown.


That's disappointing.  How about using tcg_malloc?

Maximum allocation is sizeof(tcg_temp_info) * TCG_MAX_TEMPS = 12288, which isless than TCG_POOL_CHUNK_SIZE, so we'll retain the allocation in the poolacross translations.


Otherwise,

Reviewed-by: Richard Henderson <address@hidden>


r~

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb, (continued)
- [Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 17/43] target/s390x: check CF_PARALLEL instead of parallel_cpus, Emilio G. Cota, 2017/07/19
  - Re: [Qemu-devel] [PATCH v3 17/43] target/s390x: check CF_PARALLEL instead of parallel_cpus, Richard Henderson, 2017/07/20
- [Qemu-devel] [PATCH v3 34/43] gen-icount: fold exitreq_label into TCGContext, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 09/43] tcg: consolidate TB lookups in tb_lookup__cpu_state, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing, Emilio G. Cota, 2017/07/19
  - Re: [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing, Richard Henderson, 2017/07/20
- [Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 10/43] exec-all: bring tb->invalid into tb->cflags, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps, Emilio G. Cota, 2017/07/19
  - Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps, Richard Henderson <=
    - Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps, Emilio G. Cota, 2017/07/20
    - Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps, Richard Henderson, 2017/07/20
    - Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps, Emilio G. Cota, 2017/07/21
- [Qemu-devel] [PATCH v3 33/43] tcg: define tcg_init_ctx and make tcg_ctx a pointer, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 13/43] target/arm: check CF_PARALLEL instead of parallel_cpus, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 20/43] tcg: check CF_PARALLEL instead of parallel_cpus, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 18/43] target/sh4: check CF_PARALLEL instead of parallel_cpus, Emilio G. Cota, 2017/07/19
  - Re: [Qemu-devel] [PATCH v3 18/43] target/sh4: check CF_PARALLEL instead of parallel_cpus, Richard Henderson, 2017/07/20
- [Qemu-devel] [PATCH v3 21/43] cpu-exec: lookup/generate TB outside exclusive region during step_atomic, Emilio G. Cota, 2017/07/19
- [Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers, Emilio G. Cota, 2017/07/19

Prev by Date: Re: [Qemu-devel] [PATCH v3 18/43] target/sh4: check CF_PARALLEL instead of parallel_cpus
Next by Date: Re: [Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's
Previous by thread: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Next by thread: Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Index(es):
- Date
- Thread