avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] GCC-AVR Register optimisations


From: andrewhutchinson
Subject: Re: [avr-gcc-list] GCC-AVR Register optimisations
Date: Thu, 10 Jan 2008 10:57:58 -0500

Thanks for feedback!

I will try your example latter today and see what I get. The change in register 
allocation order allows gcc to fixes some other things.

Part of the problem in your example is the strange move away from R16 to R14:

movw r14,r16 
        sec 
        adc r14,__zero_reg__ 
        adc r15,__zero_reg__ 
        movw r24,r16 

It is not obvious why this is not optimised out (unless optimisation was 
disabled or retricted). This normally only happens when all the higher numbered 
registers are used up - or it needs to preserve result across call.

It should have picked a higher number register (24 or even R28) - as I would 
expect these to be unused - and obviously ahead of R14 in the allocation order.

If the move had been made to R16 or higher, then the addition would have been 
simpler - or even as simple as your example.

However, often looking at intermediate RTL gives some clue why.

Can you tell me what optimisation setting was used?

Andy

---- Wouter van Gulik <address@hidden> wrote: 
> > Registers 17 downwards are  call saved and push/popped in prescribed
> > order by prolog/epilog functions. Also R28,29 is potential frame pointer
> > and so that is best left alone. So the key registers are: R18-R27  & R30,31
> > 
> 
> Note that in some cases it could be very interesting to use r27, or Y, 
> register.
> 
> Consider this example:
> 
> char *x;
> volatile int y;
> 
> void foo(char *p)
> {
>      y += *p;
> }
> 
> void main(void)
> {
>       char *p1 = x;
>       foo(p1++);
>       foo(p1++);
>       foo(p1++);
>       foo(p1++);
>       foo(p1++);
>       foo(p1++);
>       foo(p1++);
>       foo(p1++);
>       foo(p1++);
>       foo(p1++);
> }
> 
> 
> This will generate very bad code.
> /* prologue: frame size=0 */
>       push r14
>       push r15
>       push r16
>       push r17
> /* prologue end (size=4) */
>       lds r24,x
>       lds r25,(x)+1
>       movw r16,r24
>       subi r16,lo8(-(1))
>       sbci r17,hi8(-(1))
>       call foo
>       movw r14,r16
>       sec
>       adc r14,__zero_reg__
>       adc r15,__zero_reg__
>       movw r24,r16
>       call foo
>       movw r16,r14
>       subi r16,lo8(-(1))
>       sbci r17,hi8(-(1))
>       movw r24,r14
>       call foo
>       movw r14,r16
>       sec
>       adc r14,__zero_reg__
>       adc r15,__zero_reg__
>       movw r24,r16
>       call foo
>       movw r16,r14
>       subi r16,lo8(-(1))
>       sbci r17,hi8(-(1))
>       movw r24,r14
>       call foo
> etc..
> 
> A more optimal scheme would be
>       call foo
>       movw r24, r16
>       adiw r24, 1
>          movw r16, r24
>          call foo
> etc..
> Using the r24 capability to do a 16 bit increment
> 
> But in this "special" case there is no frame pointer. So we could use 
> R28 to store instead of R16. Then we can add on r28 and do something 
> like this:
>       
>       call foo
>       adiw r28, 1
>       movw r24, r28
>       call foo
> 
> So yes using R28 as last resort looks like a sane thing.
> Unless there is no frame pointer at all, and there is a need for 16 (or 
> 32 bit) arithmetic on saved registers. This is probably incredibly 
> difficult. But I thought to mention it anyway
> 
> HTH,
> 
> Wouter
> 
> ps.
> 
> Writing it like "foo(p); p++;" Will produce better code?!? I will fill a 
> bug report for this.
> 
> > With the order, there are several problems:
> > 
> > 1) Initial register  allocation fragments the register set. For example,
> > allocating r25 will prevent R24-25 being used for 16bit register  and
> > prevent R22-25 and R24-27 being used as 32 bit registers. gcc register
> > allocator does not seem to overcome this fragmentation.
> > 
> > 2) The situation is made worse by the order of  16bit+ register used for
> > call and return values - which are "allocated" in reverse order. eg
> > R24-R25, R22-24, R18-24.  This means that the function parameters or
> > return values are rarely  in the right place - except for 16bit values.
> > 
> > 3) Allocating a byte to odd number register precluded it being extended
> > to 16bit value without a move.
> > 
> > So, I tried creating an order which would preserve the contiguous
> > register space and avoid the above issues as much as possible.
> > This is what I ended up with:
> > 
> > R18,26,22,30,20,24,19,21,23,25,27,31,28,29, \
> >    17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,\
> > 
> > 
> > The result is a 1.25% saving in code size for a simple mixed
> > application. Pretty good for such a simple change!
> > 
> > For more floating point, the saving might well be higher as it demands
> > more contiguous 32 bit registers.
> > 
> > On the same basis, the current order of called saved registers R2-R17
> > dictated by  (mcall) prolog limit further improvement is clearly
> > imperfect.  These are used less frequently, though their cost is much
> > higher. So its difficult to gauge impact. I might take a look at some
> > intense floating point functions to see if this if it is worth pursuing
> > reordering these too.
> > 
> > 
> > Andy
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > AVR-GCC-list mailing list
> > address@hidden
> > http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]