avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] avr superoptimizer


From: Georg-Johann Lay
Subject: Re: [avr-gcc-list] avr superoptimizer
Date: Tue, 21 Apr 2009 00:09:28 +0200
User-agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)

Sean D'Epagnier schrieb:
Hi,

I posted a forum topic on a new avr superoptimizer I'm working on.
It was noted that some people on this mailing list might be interested
but don't view the forum regularly.

The forum topic also contains the source code which you can download:

http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=77805

The superoptimizer looks for more optimal sequences of instructions to
replace in a program to make it faster.  So far I have had limited
success, but there is a huge potential for more optimizations.. what I
have done is like using a toy telescope to look at the sky.

The main goal is to generate peephole optimizations which gcc can use
directly.. this may take a while, but the end result  is faster code
without making the compiler too much slower.  It will take very long
time to generate these optimizations though.

Please let me know about any ideas, suggestions, or comments.

Thanks,
Sean

Hi Sean,

as far as I understand, your tool runs on asm code, i.e. you map short sequences of asm instructions (which must not contain code labels) to other instruction sequences.

The lookup table is generated by brute force? Observing the results of asm sequences and trying to map to an other sequence that shows better performance depending on the cost function (time, space, ...).

Then you use patterns that actually match some pieces of gcc output to automatically generate peephole2 resp. peephole patterns for avr-gcc?

I must admit that I am not a big fan of insn peepholes generated this way: peepholes can fix some mess from register allocation, but what is possible on peep level is very limited because register allocation is finished. So you cannot get some scratch register, and if a register is no more used after the peephole, you cannot free it and use for something else. Moreover, if there is a peephole that matches a sequence
  A, B
it won't match
  A, X, B
where X is independent of A and B.

Maybe it's possible to invent your work to generate patterns for insn combine rather that for insn peephole? That pass runs before register allocation and allows to transform from RTL to RTL.

Skimmin over the code abr-gcc generates for libgcc, e.g., we see much romm for improving code both in size and in speed.

Finally I have some comments/question on code snippets in your avrfreaks post:

*I*

from:
   eor   r6, r6
   eor   r7, r7
   eor   r8, r8
   eor   r9, r9
to:
   ldi   r6, 0 ;; typo
   eor   r7, r7
   movw   r8, r6

Besides the typo, avr-gcc already knows how to do this, see avr.c::output_movsisf:

AVR_HAVE_MOVW ? (AS1 (clr,%A0) CR_TAB
                AS1 (clr,%B0) CR_TAB
                AS2 (movw,%C0,%A0))
             : (AS1 (clr,%A0) CR_TAB
                AS1 (clr,%B0) CR_TAB
                AS1 (clr,%C0) CR_TAB
                AS1 (clr,%D0));

Maybe you did -fsplit-wide-types? In many situations -fno-split-wide-types yields better code, but not always. Without splitting wide types RTL is sometimes a bit unhandy because it has to deal with larger entities, but with splitting wide types it's harder to keep track of the bigger picture.

*II*

from:
   ldi   r16, 101
   ori   r16, 50
   swap   r16
to:
   ldi   r16, 255
   andi   r16, 119

I am a little bit confused. Is the source an output of avr-gcc with optimization turned on? If so, we should find out why the compiler generates this code and remove the cause rather than the symptom and say "ldi r16, 119".

*III*
from:
   mov   r24, r25
   ldi   r25, 0
   mov   r25, r24
   eor   r24, r24
to:
   eor   r24, r24

The original snippet would look something like
  (set (reg:HI 101) (zero_extend:HI (reg:QI 100)))
  (set (reg:HI 102) (ashift:HI (reg:HI 101) (const_int 8)))
Combine will try
  (set (reg:HI 102)
       (ashift:HI (zero_extend:HI (reg:QI 100))
                  (const_int 8)))
Which could be split/mapped to
  (set (subreg:QI (reg:HI 102) 1) (reg:QI 100))
  (set (subreg:QI (reg:HI 102) 0) (const_int 0))
Note that this is more general and contains the code above if reload decides to allocate 102 to R24 and 100 to R25 (or 102 to REGNO and 100 to REGNO+1).

Unfortunately, skimming generated asm for such sequences and writing patterns to catch them is very time consuming. But I am unsure if automatically generated peepholes2 is what we want -- there will be bulk of patterns in the backend where no one really knows where they come from. There will be no individual comments why they are there. It will be harder to maintain the backend. -- I think before adding peepholes we should try to fix the very problems: maybe missing combine patterns, playing around with command line options, smarter ways to printout assembler, maybe costs, insn constraints, see if the bad code still persist in gcc 4.5, etc.

As I said, IMHO peep2 should be a last resort to fix mess if more sophisticated and more general approaches fail. I guess a bunch of the cases you see and treat is just because gcc doesn't handle what it could have handled if it was described somewhere.

Georg-Johann





reply via email to

[Prev in Thread] Current Thread [Next in Thread]