Re: [avr-gcc-list] avr superoptimizer

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] avr superoptimizer

From:	Georg-Johann Lay
Subject:	Re: [avr-gcc-list] avr superoptimizer
Date:	Tue, 21 Apr 2009 00:09:28 +0200
User-agent:	Mozilla Thunderbird 1.0.7 (Windows/20050923)

Sean D'Epagnier schrieb:

Hi,

I posted a forum topic on a new avr superoptimizer I'm working on.
It was noted that some people on this mailing list might be interested
but don't view the forum regularly.

The forum topic also contains the source code which you can download:

http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=77805

The superoptimizer looks for more optimal sequences of instructions to
replace in a program to make it faster.  So far I have had limited
success, but there is a huge potential for more optimizations.. what I
have done is like using a toy telescope to look at the sky.

The main goal is to generate peephole optimizations which gcc can use
directly.. this may take a while, but the end result  is faster code
without making the compiler too much slower.  It will take very long
time to generate these optimizations though.

Please let me know about any ideas, suggestions, or comments.

Thanks,
Sean


Hi Sean,

as far as I understand, your tool runs on asm code, i.e. you map shortsequences of asm instructions (which must not contain code labels) toother instruction sequences.

The lookup table is generated by brute force? Observing the results ofasm sequences and trying to map to an other sequence that shows betterperformance depending on the cost function (time, space, ...).

Then you use patterns that actually match some pieces of gcc output toautomatically generate peephole2 resp. peephole patterns for avr-gcc?

I must admit that I am not a big fan of insn peepholes generated thisway: peepholes can fix some mess from register allocation, but what ispossible on peep level is very limited because register allocation isfinished. So you cannot get some scratch register, and if a register isno more used after the peephole, you cannot free it and use forsomething else. Moreover, if there is a peephole that matches a sequence

  A, B
it won't match
  A, X, B
where X is independent of A and B.

Maybe it's possible to invent your work to generate patterns for insncombine rather that for insn peephole? That pass runs before registerallocation and allows to transform from RTL to RTL.

Skimmin over the code abr-gcc generates for libgcc, e.g., we see muchromm for improving code both in size and in speed.

Finally I have some comments/question on code snippets in your avrfreakspost:


*I*

from:
   eor   r6, r6
   eor   r7, r7
   eor   r8, r8
   eor   r9, r9
to:
   ldi   r6, 0 ;; typo
   eor   r7, r7
   movw   r8, r6

Besides the typo, avr-gcc already knows how to do this, seeavr.c::output_movsisf:


AVR_HAVE_MOVW ? (AS1 (clr,%A0) CR_TAB
                AS1 (clr,%B0) CR_TAB
                AS2 (movw,%C0,%A0))
             : (AS1 (clr,%A0) CR_TAB
                AS1 (clr,%B0) CR_TAB
                AS1 (clr,%C0) CR_TAB
                AS1 (clr,%D0));

Maybe you did -fsplit-wide-types? In many situations-fno-split-wide-types yields better code, but not always. Withoutsplitting wide types RTL is sometimes a bit unhandy because it has todeal with larger entities, but with splitting wide types it's harder tokeep track of the bigger picture.


*II*

from:
   ldi   r16, 101
   ori   r16, 50
   swap   r16
to:
   ldi   r16, 255
   andi   r16, 119

I am a little bit confused. Is the source an output of avr-gcc withoptimization turned on? If so, we should find out why the compilergenerates this code and remove the cause rather than the symptom and say"ldi r16, 119".


*III*
from:
   mov   r24, r25
   ldi   r25, 0
   mov   r25, r24
   eor   r24, r24
to:
   eor   r24, r24

The original snippet would look something like
  (set (reg:HI 101) (zero_extend:HI (reg:QI 100)))
  (set (reg:HI 102) (ashift:HI (reg:HI 101) (const_int 8)))
Combine will try
  (set (reg:HI 102)
       (ashift:HI (zero_extend:HI (reg:QI 100))
                  (const_int 8)))
Which could be split/mapped to
  (set (subreg:QI (reg:HI 102) 1) (reg:QI 100))
  (set (subreg:QI (reg:HI 102) 0) (const_int 0))

Note that this is more general and contains the code above if reloaddecides to allocate 102 to R24 and 100 to R25 (or 102 to REGNO and 100to REGNO+1).

Unfortunately, skimming generated asm for such sequences and writingpatterns to catch them is very time consuming. But I am unsure ifautomatically generated peepholes2 is what we want-- there will be bulk of patterns in the backend where no one reallyknows where they come from. There will be no individual comments whythey are there. It will be harder to maintain the backend.-- I think before adding peepholes we should try to fix the veryproblems: maybe missing combine patterns, playing around with commandline options, smarter ways to printout assembler, maybe costs, insnconstraints, see if the bad code still persist in gcc 4.5, etc.

As I said, IMHO peep2 should be a last resort to fix mess if moresophisticated and more general approaches fail. I guess a bunch of thecases you see and treat is just because gcc doesn't handle what it couldhave handled if it was described somewhere.


Georg-Johann

[Prev in Thread]

Current Thread

[Next in Thread]

RE: [avr-gcc-list] avr superoptimizer, (continued)
- Re: [avr-gcc-list] avr superoptimizer, John Regehr, 2009/04/20
  - RE: [avr-gcc-list] avr superoptimizer, Weddington, Eric, 2009/04/20
- Re: [avr-gcc-list] avr superoptimizer, Georg-Johann Lay <=
  - Re: [avr-gcc-list] avr superoptimizer, Sean D'Epagnier, 2009/04/20
    - Re: [avr-gcc-list] avr superoptimizer, Georg-Johann Lay, 2009/04/21
    - Re: [avr-gcc-list] avr superoptimizer, Sean D'Epagnier, 2009/04/21
    - Re: [avr-gcc-list] avr superoptimizer, Georg-Johann Lay, 2009/04/23

Prev by Date: Re: [avr-gcc-list] avr superoptimizer
Next by Date: RE: [avr-gcc-list] avr superoptimizer
Previous by thread: RE: [avr-gcc-list] avr superoptimizer
Next by thread: Re: [avr-gcc-list] avr superoptimizer
Index(es):
- Date
- Thread