avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Shorter code?


From: hutchinsonandy
Subject: Re: [avr-gcc-list] Shorter code?
Date: Thu, 12 Jun 2008 16:41:42 -0400

Im aware of this and similar issues and will be working to improve this.

Here is what compiler sees as individual instructions:

768: 80 91 1d 01  lds  r24, 0x011D  MOVE BYTE (volatile maybe)

76c: 20 91 1c 01  lds  r18, 0x011C  MOVE BYTE


770: 99 27        eor  r25, r25 ZERO EXTEND R24 to make unsigned int.


772: 98 2f        mov  r25, r24 SHIFT LEFT 8 bits
774: 88 27        eor  r24, r24

776: 33 27        eor  r19, r19 ZERO EXTEND R18 to make unsigned int

778: 82 2b        or   r24, r18 OR unsigned int
77a: 93 2b        or   r25, r19
77c: 08 95        ret


Though you could restructure your code to avoid some of this, the key problems are:

Left shift is seen by compiler as one instruction. But in reality it is two that can be accormplished with two BYTE MOVE (called MOVQI).

ZERO EXTEND is ok (compiler can split this into MOVQI and I assume it does)

OR is performed as 16 bit - when it could be split into two BYTE OR's.

Taking the last part, the compiler can't propagate the zero R19 into the OR - since the OR is 16 bits. If this pattern were split
then that part would collapes:


or r24,r18
ret


Similarly, if the SHIFT was split, the rest should collapse.

There is limited splitting that on gcc 4.3/4.4 (4.4 is better). This is controlled by fno-split-wide-types (or something like that) Often it is turned off since spliting just part of instructions leaves a mixed bag of BYTE/WORD/LONG instructions that prevent some optimisations.

The example you give is entirely within scope of the work I am doing, and hopefully will reach 4.4. There are some situation where it is much more difficult. This includes any arithemetic operations that involve CARRY. With the exception of shifts by 8 bits, such instructions cannot be split. This includes ADD, SUB, COMPARE and SHIFT other than by multiple of 8 bits





----------------------------------------------
Sent from my Dingleberry wired device.


-----Original Message-----
From: Ruud Vlaming <address@hidden>
To: address@hidden
Sent: Thu, 12 Jun 2008 9:27 am
Subject: [avr-gcc-list] Shorter code?



Hi

Normally gcc generates well optimized code, but
sometimes i wunder how gcc can do simple things
so complicated.

Here is an example,

uint16_t genGetTickCount(void)
{ return (((uint16_t) uxTickCount.HighByte) << 8) | (uint16_t)
(uxTickCount.LowByte) ; }

generates

00000768 <genGetTickCount>:
768: 80 91 1d 01  lds  r24, 0x011D
76c: 20 91 1c 01  lds  r18, 0x011C
770: 99 27        eor  r25, r25
772: 98 2f        mov  r25, r24
774: 88 27        eor  r24, r24
776: 33 27        eor  r19, r19
778: 82 2b        or   r24, r18
77a: 93 2b        or   r25, r19
77c: 08 95        ret

whereas it could have been 12 bytes (!) shorter:
00000768 <genGetTickCount>:
768: 80 91 1d 01  lds  r25, 0x011D
76c: 20 91 1c 01  lds  r24, 0x011C
770: 08 95        ret

Is there a way  to write the methode defined above in C to make the
generate this assembly? Some special combine function maybe?


Further, i dont know how much intelligence you may expect from the
compiler, but for example, first cleaning r25, and directly afterwards
filling it with r24 seems really a waste of effort. By direct inspection,
thus _without_ any knowledge what is going on, this code could be
reduced in the following simple steps (ignore line numbers):

00000768 <genGetTickCount>:
768: 80 91 1d 01  lds  r24, 0x011D
76c: 20 91 1c 01  lds  r18, 0x011C
770: 99 27 eor r25, r25 //remove this, since it directly overwritten
afterwards
772: 98 2f        mov  r25, r24
774: 88 27        eor  r24, r24
776: 33 27        eor  r19, r19
778: 82 2b        or   r24, r18
77a: 93 2b or r25, r19 //remove this since "or" with zero does not
change the value of r25
77c: 08 95        ret

00000768 <genGetTickCount>:
768: 80 91 1d 01  lds  r24, 0x011D
76c: 20 91 1c 01  lds  r18, 0x011C
772: 98 2f        mov  r25, r24
774: 88 27        eor  r24, r24
776: 33 27        eor  r19, r19  //remove this, the register is unused.
778: 82 2b        or   r24, r18  // change ito "mov" since r24 is zero
77c: 08 95        ret

00000768 <genGetTickCount>:
768: 80 91 1d 01 lds r24, 0x011D //directly fill this with r25, since the
value r24 is destroyed after the move
76c: 20 91 1c 01  lds  r18, 0x011C
772: 98 2f mov r25, r24 //remove this since r25 will be filled
directly
774: 88 27 eor r24, r24 //remove this, since it directly overwritten
afterwards
778: 82 2b        mov   r24, r18
77c: 08 95        ret

00000768 <genGetTickCount>:
768: 80 91 1d 01  lds  r25, 0x011D
76c: 20 91 1c 01 lds r18, 0x011C //direcly fill this with r24 since r18 is
unsed after the move
778: 82 2b mov r24, r18 //remove this since it r24 will be filled
directly
77c: 08 95        ret

00000768 <genGetTickCount>:
768: 80 91 1d 01  lds  r25, 0x011D
76c: 20 91 1c 01  lds  r24, 0x011C
77c: 08 95        ret

Could such post compiler optimization steps be integrated in the compiler?

Like to hear your comments.

Ruud.



_______________________________________________
AVR-GCC-list mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list





reply via email to

[Prev in Thread] Current Thread [Next in Thread]