avr-gcc-list
[Top][All Lists]

## Re: [avr-gcc-list] Small super-optimizer results

 From: Paulo Marques Subject: Re: [avr-gcc-list] Small super-optimizer results Date: Tue, 12 Jul 2011 12:51:06 +0100 User-agent: Thunderbird 2.0.0.23 (X11/20090817)

```Bob Paddock wrote:
> On Mon, Jul 11, 2011 at 2:02 PM, Paulo Marques <address@hidden> wrote:
>
>> Some time ago I toyed with trying to build a super-optimizer for the
>> avr. The project didn't went very far, but it produced a few small bits
>> of code that I think are worth sharing.
>>
>> Basically I was trying to produce functions to multiply an 8 bit value
>> by 8 bit constants with a 16 bit result in avr families without a mul
>> instruction.
>
> Your results looked similar to what I've seen out of the
> "star-chain" multiplication routine from Doctor Dobb's Journal #125, March
> 1987.
> [The old multiplication one is not on line, but the newer division one
> is, with broken formatting:
> http://drdobbs.com/high-performance-computing/184408499 ]
>
> Curious if you were doing the some thing similar?

I wouldn't say these are really similar. The approach from Robert Grapel
is a mathematical one whereas my approach is to use whatever CPU
instructions that get the right result, even if we wouldn't expect them to.

For instance, the super optimizer could decide to use a SWAP instruction
or a EOR if that produces the right result, no matter why.

The tricky part is that it is trying to get a 16 bit result on a 8 bit
cpu, whereas the star-chain algorithm is assuming the registers are 32
bit or something like that.

Just compare the result for the 255 multiply:

star-chain:
>       Rw =   R1 = /* Your Multiplicand */
>       Rw <<= 8;
>       Rw -=  R1;

super-opt:
>       NEG r0
>       SBC r1, r0

The star-chain result is pretty obvious: multiply by 256 and subtract
one time.

A programmer trying to translate the star-chain algorithm to avr could
do something like:

; shift by 8
mov r1, r0
eor r0, r0
; subtract one time
sub r0, r1
sbc r1, __zero_reg__

But the super optimizer did it in just 2 instructions (3 if you need to
clear r1 first).

Having said that, the star-chain algorithm seems really useful for
slightly larger programs.

It will be hard to get sequences larger than 6 instructions out of the
optimizer but that shouldn't be a problem for the star-chain.

Thanks for pulling this old reference for us,

--
Paulo Marques
Software Development Department - Grupo PIE, S.A.
Phone: +351 252 290600, Fax: +351 252 290601
Web: www.grupopie.com

"Don't worry, you'll be fine; I saw it work in a cartoon once..."

```