[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] avr superoptimizer
From: |
Georg-Johann Lay |
Subject: |
Re: [avr-gcc-list] avr superoptimizer |
Date: |
Tue, 21 Apr 2009 00:09:28 +0200 |
User-agent: |
Mozilla Thunderbird 1.0.7 (Windows/20050923) |
Sean D'Epagnier schrieb:
Hi,
I posted a forum topic on a new avr superoptimizer I'm working on.
It was noted that some people on this mailing list might be interested
but don't view the forum regularly.
The forum topic also contains the source code which you can download:
http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=77805
The superoptimizer looks for more optimal sequences of instructions to
replace in a program to make it faster. So far I have had limited
success, but there is a huge potential for more optimizations.. what I
have done is like using a toy telescope to look at the sky.
The main goal is to generate peephole optimizations which gcc can use
directly.. this may take a while, but the end result is faster code
without making the compiler too much slower. It will take very long
time to generate these optimizations though.
Please let me know about any ideas, suggestions, or comments.
Thanks,
Sean
Hi Sean,
as far as I understand, your tool runs on asm code, i.e. you map short
sequences of asm instructions (which must not contain code labels) to
other instruction sequences.
The lookup table is generated by brute force? Observing the results of
asm sequences and trying to map to an other sequence that shows better
performance depending on the cost function (time, space, ...).
Then you use patterns that actually match some pieces of gcc output to
automatically generate peephole2 resp. peephole patterns for avr-gcc?
I must admit that I am not a big fan of insn peepholes generated this
way: peepholes can fix some mess from register allocation, but what is
possible on peep level is very limited because register allocation is
finished. So you cannot get some scratch register, and if a register is
no more used after the peephole, you cannot free it and use for
something else. Moreover, if there is a peephole that matches a sequence
A, B
it won't match
A, X, B
where X is independent of A and B.
Maybe it's possible to invent your work to generate patterns for insn
combine rather that for insn peephole? That pass runs before register
allocation and allows to transform from RTL to RTL.
Skimmin over the code abr-gcc generates for libgcc, e.g., we see much
romm for improving code both in size and in speed.
Finally I have some comments/question on code snippets in your avrfreaks
post:
*I*
from:
eor r6, r6
eor r7, r7
eor r8, r8
eor r9, r9
to:
ldi r6, 0 ;; typo
eor r7, r7
movw r8, r6
Besides the typo, avr-gcc already knows how to do this, see
avr.c::output_movsisf:
AVR_HAVE_MOVW ? (AS1 (clr,%A0) CR_TAB
AS1 (clr,%B0) CR_TAB
AS2 (movw,%C0,%A0))
: (AS1 (clr,%A0) CR_TAB
AS1 (clr,%B0) CR_TAB
AS1 (clr,%C0) CR_TAB
AS1 (clr,%D0));
Maybe you did -fsplit-wide-types? In many situations
-fno-split-wide-types yields better code, but not always. Without
splitting wide types RTL is sometimes a bit unhandy because it has to
deal with larger entities, but with splitting wide types it's harder to
keep track of the bigger picture.
*II*
from:
ldi r16, 101
ori r16, 50
swap r16
to:
ldi r16, 255
andi r16, 119
I am a little bit confused. Is the source an output of avr-gcc with
optimization turned on? If so, we should find out why the compiler
generates this code and remove the cause rather than the symptom and say
"ldi r16, 119".
*III*
from:
mov r24, r25
ldi r25, 0
mov r25, r24
eor r24, r24
to:
eor r24, r24
The original snippet would look something like
(set (reg:HI 101) (zero_extend:HI (reg:QI 100)))
(set (reg:HI 102) (ashift:HI (reg:HI 101) (const_int 8)))
Combine will try
(set (reg:HI 102)
(ashift:HI (zero_extend:HI (reg:QI 100))
(const_int 8)))
Which could be split/mapped to
(set (subreg:QI (reg:HI 102) 1) (reg:QI 100))
(set (subreg:QI (reg:HI 102) 0) (const_int 0))
Note that this is more general and contains the code above if reload
decides to allocate 102 to R24 and 100 to R25 (or 102 to REGNO and 100
to REGNO+1).
Unfortunately, skimming generated asm for such sequences and writing
patterns to catch them is very time consuming. But I am unsure if
automatically generated peepholes2 is what we want
-- there will be bulk of patterns in the backend where no one really
knows where they come from. There will be no individual comments why
they are there. It will be harder to maintain the backend.
-- I think before adding peepholes we should try to fix the very
problems: maybe missing combine patterns, playing around with command
line options, smarter ways to printout assembler, maybe costs, insn
constraints, see if the bad code still persist in gcc 4.5, etc.
As I said, IMHO peep2 should be a last resort to fix mess if more
sophisticated and more general approaches fail. I guess a bunch of the
cases you see and treat is just because gcc doesn't handle what it could
have handled if it was described somewhere.
Georg-Johann
- RE: [avr-gcc-list] avr superoptimizer, (continued)
- RE: [avr-gcc-list] avr superoptimizer, Weddington, Eric, 2009/04/20
- [avr-gcc-list] Re: avr superoptimizer, David Brown, 2009/04/20
- RE: [avr-gcc-list] Re: avr superoptimizer, Weddington, Eric, 2009/04/20
- [avr-gcc-list] Re: avr superoptimizer, David Brown, 2009/04/20
- RE: [avr-gcc-list] Re: avr superoptimizer, Weddington, Eric, 2009/04/20
- Re: [avr-gcc-list] avr superoptimizer, Josef Eisl, 2009/04/20
RE: [avr-gcc-list] avr superoptimizer, Weddington, Eric, 2009/04/20
Re: [avr-gcc-list] avr superoptimizer, John Regehr, 2009/04/20
Re: [avr-gcc-list] avr superoptimizer,
Georg-Johann Lay <=