[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: [avr-gcc-list] optimizer / Compiler patch

From: Haase Bjoern (PT-BEU/MKP5) *
Subject: AW: [avr-gcc-list] optimizer / Compiler patch
Date: Thu, 25 Nov 2004 11:06:28 +0100


I again have had a loook on your this specific case where the zero extension 
uses more registers
than would be essential for the task.  For this special case, it is fairly easy 
to tell the 
compiler to use a more efficient pattern, such that

uint32_t target_bit_variable;

void testUInt8_to_32 (void)
{ target32_bit_variable = returnUInt8function();

is compiled to

        rcall returnUInt8_function
        sts target32_bit_variable,r24                     
        sts (target32_bit_variable)+1,__zero_reg__        
        sts (target32_bit_variable)+2,__zero_reg__        
        sts (target32_bit_variable)+3,__zero_reg__        

instead of
        rcall returnUInt8_function
        clr r25                             
        clr r26                             
        clr r27                             
        sts target32_bit_variable,r24                    
        sts (target32_bit_variable)+1,r25                
        sts (target32_bit_variable)+2,r26                
        sts (target32_bit_variable)+3,r27                

For this purpose it is simply necessary to add an instruction pattern to 
"avr.md" of the type
(define_insn "*mov_MEMint32_REGuint8"                                 
 [(set (match_operand:SI 0 "memory_operand" "")                       
       (zero_extend:SI(match_operand:QI 1 "register_operand" ""))     
 "sts %A0,%A1                                                         
        sts %B0,__zero_reg__                                              
        sts %C0,__zero_reg__                                              
        sts %D0,__zero_reg__"                                             
  [(set_attr "length" "4")]                                           

I have attached a file containing also the required patterns also for the case 
of sign extension 
and for also 16 bit target variables (compare attached file).

A similar method possibly could work for other operations (additions, shifting, 
involving global variables that are so seldomly used that it is not useful to 
hold them in registers.
If one would try to implement this in a similar way as in the above example, 
this would, however,
imply that many many additional patterns for each special case would be 
required in the machine description.

I would be willing to implement it. I, however, would appreciate a comment of a 
more experienced 
gcc expert on the proper way to do it (i.e. rather a huge "avr.md" or rather an 
implementation whithin
"avr.c" ).



BTW: In my own application, the pattern above did not show up one single time 
;-). So it might
be justified to consider "target32_bit_variable = returnUInt8function();" to be 
a fairly rare case.

-----Ursprüngliche Nachricht-----
Von: Ben Mann [mailto:address@hidden
Gesendet: Mittwoch, 24. November 2004 15:03
An: Haase Bjoern (PT-BEU/MKP5) *; 'Bernard Fouché';
Betreff: RE: [avr-gcc-list] optimizer

I can understand there's some challenge of making these sort of changes to
the RTL compiler. Nevertheless it seems that for embedded work this sort of
stuff is going to be quite important (speed and size always an issue...)

I realise this is not very helpful, but the best I could dream up so far was
a little macro to replace the compiler's casting:

//optimally cast a char to a long
#define CAST_CHAR2LONG(dest,src) \
    *((char*)&(dest)) = (src); \
    *((char*)&(dest)+1) = 0; \
    *((char*)&(dest)+2) = 0; \
    *((char*)&(dest)+3) = 0 

long var;
CAST_CHAR2LONG(var,eeprom_read_byte((char *)ADDR));
//replaces var = eeprom_read_byte((char *)ADDR)

The code generated is (as you might imagine) substantially tighter and works
for local or global "var". However, the syntax sucks. I wonder if there's a
better way?

Ben Mann

-----Original Message-----
From: Haase Bjoern (PT-BEU/MKP5) * [mailto:address@hidden 
Sent: Wednesday, 24 November 2004 8:56 PM
To: address@hidden; Bernard Fouché; address@hidden
Subject: AW: [avr-gcc-list] optimizer



IMHO the possible benefit of a 32-> 4x8 splitting at the RTL level does not
really justify 
the required amount of changes in the compiler.


-----Original Message-----
From: address@hidden [mailto:address@hidden
On Behalf Of Bernard Fouché
Sent: Wednesday, 24 November 2004 7:18 PM
To: address@hidden
Subject: [avr-gcc-list] optimizer


I'm compiling with -Os for atmega64 with avr-gcc 3.4.2. When I have

uint32_t var;


the generated code is, for instance:

 var=(uint32_t)eeprom_read_byte((uint8_t *)EEPROM_PARM);
ldi     r24, 0x36       ; 54
ldi     r25, 0x00       ; 0
call    0xf9c0
eor     r25, r25
eor     r26, r26
eor     r27, r27
sts     0x046B, r24
sts     0x046C, r25
sts     0x046D, r26
sts     0x046E, r27

Could it be instead:
ldi     r24, 0x36       ; 54
ldi     r25, 0x00       ; 0
call    0xf9c0
sts     0x046B, r24
sts     0x046C, r1
sts     0x046D, r1
sts     0x046E, r1

That would spare 6 bytes...


avr-gcc-list mailing list
address@hidden http://www.avr1.org/mailman/listinfo/avr-gcc-list

avr-gcc-list mailing list
address@hidden http://www.avr1.org/mailman/listinfo/avr-gcc-list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]