avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] register allocation - something not right


From: HutchinsonAndy
Subject: [avr-gcc-list] register allocation - something not right
Date: Sun, 26 Dec 2004 18:39:54 -0500

I have been looking at the gcc avr register allocation and something are not 
quite right.

I would welcome feedback on this as I work towards a solution.

It is very common to see sub-optimal allocations that results in redundant 
moves. Somelike

reg x = reg y
reg z = nnn
reg y = reg x

often between function calls.

The new-ra options fixes many of these issues and I was hoping this 
experimental allocator would eventually find its way in to mainstream gcc - 
that appears unlikely. In addition, new-ra also has a bug that get the frame 
size wrong and so adds overhead too  many zero frame functions.

So I set about trying to find out why the basical allocator was so messed up.

I found part of the issue in the order of register allocation. The default 
order  being 24,25,18,19,20,21,22,23.... (although it can be changed using 
-morder1 or -morder2)

Local register allocation uses the next avialable register from this list. 
Allocation will select a register that is not destroyed by call if the required 
register needs to survive a call. As r18..r27 are call used 

At this time I am not concerned with registers need to survive function calls 
so 

If the register does not need to survive calls, call used registers are 
obviously preferred and indeed indicated by the list.

So what happens is that temporary registers between calls favor r24..r25. This  
also permits the result of int functions and the argument of int function to be 
maintained in r24..r25, resulting in tight code.
For example:

 190:test1.c       **** int divi(int a, int b)
 191:test1.c       **** {
 637                .LM57:
 638                /* prologue: frame size=0 */
 639                /* prologue end (size=0) */
 640 026c 0E94 0000         call __divmodhi4
 641                /* epilogue: frame size=0 */
 642 0270 0895              ret
 643                /* epilogue end (size=1) */
 644                /* function divi size 3 (2) */


For long or float functions, r22..r25 is used - but gcc still starts at r24 
using r24..r27 for long temporary values. So we end up with moves to and from 
r22..r27 and r24..r26 as shown in the following example:

 190:test1.c       **** int divi(int a, int b)
 191:test1.c       **** {
 637                .LM57:
 638                /* prologue: frame size=0 */
 639                /* prologue end (size=0) */
 640 026c 0E94 0000         call __divmodhi4
 641                /* epilogue: frame size=0 */
 642 0270 0895              ret
 643                /* epilogue end (size=1) */
 644                /* function divi size 3 (2) */
 646                .Lscope8:
 648                    .stabd  78,0,0
 652                .global div
 654                div:
 655                    .stabd  46,0,0
 192:test1.c       ****     return a%b;
 193:test1.c       **** }
 194:test1.c       **** float div(float a,float b)
 195:test1.c       **** {
 657                .LM58:
 658                /* prologue: frame size=0 */
 659                /* prologue end (size=0) */
 660 0272 DC01              movw r26,r24
 661 0274 CB01              movw r24,r22
 662 0276 BC01              movw r22,r24
 663 0278 CD01              movw r24,r26
 664 027a 0E94 0000         call __divsf3
 665 027e DC01              movw r26,r24
 666 0280 CB01              movw r24,r22
 196:test1.c       ****     return a/b;
 197:test1.c       **** }
 668                .LM59:
 669 0282 BC01              movw r22,r24
 670 0284 CD01              movw r24,r26
 671                /* epilogue: frame size=0 */
 672 0286 0895              ret

There is obviously an issue with gcc failing to realise that r22..r27 DOES have 
the right value in the above example and that the moves can be eliminated. 

The movement to/from r24 is originally generated by the allocation order. The 
failure to optimise this later appears to be due to the overlap of r22..r25 and 
r24..r27, (ie register optimisation cant cope with that)

If the same code is compiled using -morder1. The local allocation order then 
becomes: 18,19,   20,21,    22,23,    24,25.....
This has no overlap and resulting in far superior code:

194:test1.c       **** float div(float a,float b)
 195:test1.c       **** {
 654                .LM58:
 655                /* prologue: frame size=0 */
 656                /* prologue end (size=0) */
 657 026c 0E94 0000         call __divsf3
 658                /* epilogue: frame size=0 */
 659 0270 0895              ret
 660                /* epilogue end (size=1) */
 661                /* function div size 3 (2) */


For a wider test I tried butterfly which went from 0x363e bytes to 0x360c 
bytes. Some test code that used more long/float operations fell from 
1996(dec) to 1956(dec) bytes.

Now there is more to it than I have explained. Overlap and non-optimal placment 
still exist is subsequent allocations. So the order of the entire list can be 
critical.

Using a custom order ( 22,23,24,25, 18,19....) saw futher reductions to 0x35c4 
and 1940(dec) respectively. 

Perhaps more is possible but as I tried, I came across other issues that were 
clearly countering my attempts to arrive at a logical ordering.

In particular, the order in which registers are used in function calls. For 
example,  r22..r25 used for a long/float arg/result whereas r24..r25 is used 
for integer arg/results.

There would appear to be two issues with this:

i) using r24 immediately splits the bank of call used registers (r18-r27). This 
makes subsequent register allocations around a function call more difficult. 
Starting at r18 would avoid this.

ii) Values are always EXTENDED into higher numbered registers. So extending 
r24..r25 into a long would naturally become r24..r27. However, our functions 
use r22..r25 - so we end up with more overlapping moves!.

The optimal order for function parameters and return values would seem to be: 
r18..r19,r18..r21 etc. i.e. starting at one end and strictly in "endian" order. 
If that can be achieved, the register allocation order should be much easier to 
determine and much more productive.

Feedback welcome.

PS for ref gcc 4.0.0 experimental ss12052005 - hacked.














__________________________________________________________________
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp


reply via email to

[Prev in Thread] Current Thread [Next in Thread]