qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer te


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Date: Wed, 19 Jul 2017 21:39:35 -1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of having a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 2% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
        -machine type=virt -nographic -smp 1 -m 4096 \
        -netdev user,id=unet,hostfwd=tcp::2222-:22 \
        -device virtio-net-device,netdev=unet \
        -drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
        -device virtio-blk-device,drive=myblock \
        -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
        -name arm,debug-threads=on -smp 1' (10 runs):

Before:
       19489.126318 task-clock                #    0.960 CPUs utilized          
  ( +-  0.96% )
             23,697 context-switches          #    0.001 M/sec                  
  ( +-  0.51% )
                  1 CPU-migrations            #    0.000 M/sec
             19,953 page-faults               #    0.001 M/sec                  
  ( +-  0.40% )
     56,214,402,410 cycles                    #    2.884 GHz                    
  ( +-  0.95% ) [83.34%]
     25,516,669,513 stalled-cycles-frontend   #   45.39% frontend cycles idle   
  ( +-  0.69% ) [83.33%]
     17,266,165,747 stalled-cycles-backend    #   30.71% backend  cycles idle   
  ( +-  0.59% ) [66.66%]
     79,007,843,327 instructions              #    1.41  insns per cycle
                                              #    0.32  stalled cycles per 
insn  ( +-  1.19% ) [83.34%]
     13,136,600,416 branches                  #  674.048 M/sec                  
  ( +-  1.29% ) [83.34%]
        274,715,270 branch-misses             #    2.09% of all branches        
  ( +-  0.79% ) [83.33%]

       20.300335944 seconds time elapsed                                        
  ( +-  0.55% )

After:
       19917.737030 task-clock                #    0.955 CPUs utilized          
  ( +-  0.74% )
             23,973 context-switches          #    0.001 M/sec                  
  ( +-  0.37% )
                  1 CPU-migrations            #    0.000 M/sec
             19,824 page-faults               #    0.001 M/sec                  
  ( +-  0.38% )
     57,380,269,537 cycles                    #    2.881 GHz                    
  ( +-  0.70% ) [83.34%]
     26,462,452,508 stalled-cycles-frontend   #   46.12% frontend cycles idle   
  ( +-  0.65% ) [83.34%]
     17,970,546,047 stalled-cycles-backend    #   31.32% backend  cycles idle   
  ( +-  0.64% ) [66.67%]
     79,527,238,334 instructions              #    1.39  insns per cycle
                                              #    0.33  stalled cycles per 
insn  ( +-  0.79% ) [83.33%]
     13,272,362,192 branches                  #  666.359 M/sec                  
  ( +-  0.83% ) [83.34%]
        278,357,773 branch-misses             #    2.10% of all branches        
  ( +-  0.65% ) [83.33%]

       20.850558455 seconds time elapsed                                        
  ( +-  0.55% )

That is, 2.70% slowdown.

That's disappointing.  How about using tcg_malloc?

Maximum allocation is sizeof(tcg_temp_info) * TCG_MAX_TEMPS = 12288, which is less than TCG_POOL_CHUNK_SIZE, so we'll retain the allocation in the pool across translations.

Otherwise,

Reviewed-by: Richard Henderson <address@hidden>


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]