[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v2 0/6] Implement constant folding and copy propagat
From: |
Kirill Batuzov |
Subject: |
[Qemu-devel] [PATCH v2 0/6] Implement constant folding and copy propagation in TCG |
Date: |
Thu, 9 Jun 2011 14:45:38 +0400 |
This series implements some basic machine-independent optimizations. They
simplify code and allow liveness analysis do it's work better.
Suppose we have following ARM code:
movw r12, #0xb6db
movt r12, #0xdb6d
In TCG before optimizations we'll have:
movi_i32 tmp8,$0xb6db
mov_i32 r12,tmp8
mov_i32 tmp8,r12
ext16u_i32 tmp8,tmp8
movi_i32 tmp9,$0xdb6d0000
or_i32 tmp8,tmp8,tmp9
mov_i32 r12,tmp8
And after optimizations we'll have this:
movi_i32 r12,$0xdb6db6db
Here are performance evaluation results on SPEC CPU2000 integer tests in
user-mode emulation on x86_64 host. There were 5 runs of each test on
reference data set. The tables below show runtime in seconds for all these
runs.
ARM guest without optimizations:
Test name #1 #2 #3 #4 #5 Median
1164.gzip 1402.874 1379.836 1417.294 1417.466 1420.494 1417.294
175.vpr 1246.994 1245.201 1251.247 1250.812 1249.648 1249.648
176.gcc 912.617 912.646 913.649 913.443 913.637 913.443
181.mcf 198.141 198.648 196.275 196.9 198.195 198.141
186.crafty 1546.115 1545.978 1548.002 1547.723 1547.799 1547.723
197.parser 3780.037 3780.017 3773.602 3773.535 3773.579 3773.602
252.eon 2776.173 2776.205 2778.144 2778.119 2778.048 2778.048
253.perlbmk 2592.829 2558.778 2594.292 2594.147 2594.408 2594.147
256.bzip2 1198.577 1306.549 1310.027 1310.033 1311.768 1310.027
300.twolf 2918.948 2919.119 2925.63 2926.117 2925.812 2925.63
ARM guest with optimizations:
Test name #1 #2 #3 #4 #5 Median Gain
1164.gzip 1399.441 1399.356 1416.72 1416.728 1416.728 1416.72 0.04%
175.vpr 1237.045 1143.302 1236.568 1236.503 1236.497 1236.503 1.05%
176.gcc 919.443 919.588 919.675 919.939 906.544 919.588 -0.67%
181.mcf 198.034 198.894 195.263 195.481 195.584 195.584 1.29%
186.crafty 1522.338 1520.968 1521.359 1521.222 1521.355 1521.355 1.70%
197.parser 3787.424 3787.306 3790.889 3791.066 3791.165 3790.889 -0.46%
252.eon 2749.335 2749.254 2750.692 2750.615 2750.678 2750.615 0.99%
253.perlbmk 2479.28 2568.318 2566.599 2566.574 2566.499 2566.574 1.06%
256.bzip2 1297.906 1276.943 1301.607 1301.957 1301.601 1301.601 0.64%
300.twolf 2887.985 2888.23 2882.813 2882.955 2882.533 2882.955 1.46%
x86_64 guest without optimizations:
Test name #1 #2 #3 #4 #5 Median
164.gzip 857.69 857.671 857.661 857.615 857.645 857.661
175.vpr 959.342 959.309 959.274 914.857 959.214 959.274
176.gcc 646.671 646.626 609.978 646.604 646.64 646.626
181.mcf 221.225 221.377 219.661 221.949 220.563 221.225
186.crafty 1129.716 1129.689 1129.636 1129.536 1129.602 1129.636
197.parser 1809.341 1809.494 1809.341 1809.369 1809.256 1809.341
253.perlbmk 1788.619 1679.546 1729.817 1787.017 1785.432 1785.432
254.gap 1061.071 1061.088 1061.072 1061.057 1061.063 1061.071
255.vortex 1914.02 1913.973 1914.048 1742.677 1914.072 1914.02
256.bzip2 1011.95 1011.86 1011.996 1012.023 1012.144 1011.996
300.twolf 1331.837 1330.556 1330.518 1330.554 1330.58 1330.556
x86_64 guest with optimizations:
Test name #1 #2 #3 #4 #5 Median Gain
164.gzip 863.013 863.008 863.027 863.042 848.468 863.013 -0.62%
175.vpr 970.454 970.685 971.395 970.667 970.68 970.68 -1.19%
176.gcc 644.71 644.698 644.652 636.313 644.711 644.698 0.30%
181.mcf 216.047 219.63 217.556 218.116 219.185 218.116 1.41%
186.crafty 1129.916 1130.078 1129.925 1129.93 1129.893 1129.925 -0.03%
197.parser 1829.2 1829.294 1829.347 1829.381 1829.394 1829.347 -1.11%
253.perlbmk 1769.039 1767.712 1738.613 1769.017 1768.858 1768.858 0.93%
254.gap 1062.494 1062.454 1062.522 1062.407 1062.488 1062.488 -0.13%
255.vortex 1929.135 1928.734 1930.285 1902.448 1928.92 1928.92 -0.78%
256.bzip2 1015.546 1015.64 1015.492 1015.758 1016.62 1015.64 -0.36%
300.twolf 1325.163 1325.249 1325.385 1325.098 1325.116 1325.163 0.41%
ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
work under QEMU for some unrelated reason.
Changes:
v1 -> v2
- State and Vals arrays merged to an array of structures.
- Added reference counting of temp's copies. This helps to reset temp's state
faster in most cases.
- Do not make copy propagation through operations with TCG_OPF_CALL_CLOBBER or
TCG_OPF_SIDE_EFFECTS flag.
- Split some expression simplifications into independent switch.
- Let compiler handle signed shifts and sign/zero extends in it's
implementation defined way.
Kirill Batuzov (6):
Add TCG optimizations stub
Add copy and constant propagation.
Do constant folding for basic arithmetic operations.
Do constant folding for boolean operations.
Do constant folding for shift operations.
Do constant folding for unary operations.
Makefile.target | 2 +-
tcg/optimize.c | 633 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
tcg/tcg.c | 6 +
tcg/tcg.h | 3 +
4 files changed, 643 insertions(+), 1 deletions(-)
create mode 100644 tcg/optimize.c
--
1.7.4.1
- [Qemu-devel] [PATCH v2 0/6] Implement constant folding and copy propagation in TCG,
Kirill Batuzov <=
- [Qemu-devel] [PATCH v2 2/6] Add copy and constant propagation., Kirill Batuzov, 2011/06/09
- [Qemu-devel] [PATCH v2 4/6] Do constant folding for boolean operations., Kirill Batuzov, 2011/06/09
- [Qemu-devel] [PATCH v2 6/6] Do constant folding for unary operations., Kirill Batuzov, 2011/06/09
- [Qemu-devel] [PATCH v2 5/6] Do constant folding for shift operations., Kirill Batuzov, 2011/06/09
- [Qemu-devel] [PATCH v2 1/6] Add TCG optimizations stub, Kirill Batuzov, 2011/06/09
- [Qemu-devel] [PATCH v2 3/6] Do constant folding for basic arithmetic operations., Kirill Batuzov, 2011/06/09