avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] gcc signal overhead, redundant code, bug (?), far from op


From: Szikra Istvan
Subject: [avr-gcc-list] gcc signal overhead, redundant code, bug (?), far from optimal
Date: Thu, 18 Aug 2005 16:05:25 +0200

I have problem with avr-gcc generated "long-winded" assembly code.
What I'm trying to write is time critical, so these unnecessary instructions
do matter. (it it also large, so I don't intend to write everything in asm)

I have some signal, one of them is this:
[c source]
SIGNAL(SIG_OUTPUT_COMPARE3A)
{
    DPA1(0);   ///<debug
    FLAG_Timer_1s = 1;
    DPA1(1);   ///<debug
}
[/c source]

The compiled code (.s) looks like this:
[asm]
.global __vector_26
        .type   __vector_26, @function
__vector_26:
.LFB46:
.LM9:
/* prologue: frame size=0 */
        push __zero_reg__
        push __tmp_reg__
        in __tmp_reg__,__SREG__
        push __tmp_reg__
        clr __zero_reg__
        push r24
/* prologue end (size=6) */
.LM10:
        cbi 59-0x20,1
.LM11:
        ldi r24,lo8(1)
        sts FLAG_Timer_1s,r24
.LM12:
        sbi 59-0x20,1
/* epilogue: frame size=0 */
        pop r24
        pop __tmp_reg__
        out __SREG__,__tmp_reg__
        pop __tmp_reg__
        pop __zero_reg__
        reti
/* epilogue end (size=6) */
/* function __vector_26 size 17 (5) */
[/asm]

This is awfully lot code. 
 First of all, why clear __zero_reg__ ? It is NOT used in the interrupt.
Hell why push, clr, pop, isn't supposed zero be in it anyway? So I'm not
sure this push, clr, pop is needed in any interrupt, but im sure it is not
in this.

Am I missing something?

 The second, in this case why save SREG? Which instruction uses it?
If I'm reading the 'AVR Instruction Set' pdf right the cbi, ldi, sts, sbi
don't modify it.
---- Oh, sure the unwanted clr in the prologue modifies it :( sorry

I think this would do just fine:
__vector_26:
        push r24

        cbi 59-0x20,1
        ldi r24,lo8(1)
        sts FLAG_Timer_1s,r24
        sbi 59-0x20,1

        pop r24
        reti
/* prologue size=1, epilogue size=2, function __vector_26 size 8 (5) */
Am I wrong?

I read something similar mail in the archive about this, my question:

Can I somehow modify the prologue without writing everything in asm?

My other problem is in other signals. The gcc generated signal handler
pushes all the used registers in the beginning.
I have a signal that rarely does much calculation so it needs a lot of
registers, but usually just does a simple checking, and uses only 1 or 2
register(s). This signal is called very often (in the checking state) so it
needs to be fast. (when it does the calculation it is called less times.)
Most of the time it just do the checking, so it needs to be as fast as
possible, in order to the main program and other signals could get cpu time.

So, how can I write the signal handler in a way that the simple checking
wouldn't need to wait for the pushes (and pops) of registers which only the
calculation part uses?

Here is a part of the signal:
[c source]
SIGNAL(SIG_OUTPUT_COMPARE2)
{
  DPA5(0);  ///<debug
    char Rx;
//    Rx = COAX_Rx(); //inline function/macro:
    register char output=COAXDATAOUT_IN;
    Rx=COAXDATAIN_IN;
    if (output & COAXDATAOUT_TxD_PIN) Rx >>=1;
    if (output & COAXDATAOUT_DTR_PIN) Rx >>=1;
    Rx &= COAXDATAIN_TxD_PIN;

    DPA4(Rx);     ///<debug
    if ( (SWU_Rx_state == SWU_state_start) && (Rx) ) return;

 // ... calculation

}
[/c source]

It compiles to this:
[lss]
SIGNAL(SIG_OUTPUT_COMPARE2)
{
 128:   1f 92           push    r1
 12a:   0f 92           push    r0
 12c:   0f b6           in      r0, 0x3f        ; 63
 12e:   0f 92           push    r0
 130:   11 24           eor     r1, r1
 132:   2f 93           push    r18
 134:   3f 93           push    r19
 136:   4f 93           push    r20
 138:   8f 93           push    r24
 13a:   9f 93           push    r25
 13c:   ef 93           push    r30
 13e:   ff 93           push    r31

  DPA5(0);  ///<debug
 140:   dd 98           cbi     0x1b, 5 ; 27

    register char output=COAXDATAOUT_IN;
 142:   83 b3           in      r24, 0x13       ; 19

    char Rx=COAXDATAIN_IN;
 144:   23 b3           in      r18, 0x13       ; 19

    if (output & COAXDATAOUT_TxD_PIN) Rx >>=1;
 146:   99 27           eor     r25, r25          <------- WHY GOD, WHY
?????
 148:   84 fd           sbrc    r24, 4
 14a:   26 95           lsr     r18

    if (output & COAXDATAOUT_DTR_PIN) Rx >>=1;
 14c:   85 fd           sbrc    r24, 5
 14e:   26 95           lsr     r18

    Rx &= COAXDATAIN_TxD_PIN;
 150:   21 70           andi    r18, 0x01       ; 1

    D   PA4(Rx);     ///<debug
 152:   11 f0           breq    .+4             ; 0x158
 154:   dc 9a           sbi     0x1b, 4 ; 27
 156:   01 c0           rjmp    .+2             ; 0x15a
 158:   dc 98           cbi     0x1b, 4 ; 27

    if ( (SWU_Rx_state == SWU_state_start) && (Rx) ) return;
 15a:   80 91 62 09     lds     r24, 0x0962
 15e:   81 30           cpi     r24, 0x01       ; 1
 160:   19 f4           brne    .+6             ; 0x168 (calc)
 162:   22 23           and     r18, r18
 164:   09 f0           breq    .+2             ; 0x168 (calc)
 166:   a3 c0           rjmp    .+326           ; 0x2ae (return)

    // ... calculation:
 168: ...
...

 2ae:   ff 91           pop     r31
 2b0:   ef 91           pop     r30
 2b2:   9f 91           pop     r25
 2b4:   8f 91           pop     r24
 2b6:   4f 91           pop     r20
 2b8:   3f 91           pop     r19
 2ba:   2f 91           pop     r18
 2bc:   0f 90           pop     r0
 2be:   0f be           out     0x3f, r0        ; 63
 2c0:   0f 90           pop     r0
 2c2:   1f 90           pop     r1
 2c4:   18 95           reti

So there are a lot of pushes and pops, although only r18 and r24 (,r25) are
used most of the times. This leaves less cpu time for the other tasks.

An other strange thing is the "eor r25,r25". (It is equal to a nop)
Can someone explain it to me?

Any suggestions, workarounds?


Thx,
Istvan Szikra

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
[lst for SIGNAL(SIG_OUTPUT_COMPARE3A)]
 136                    .Lscope0:
 138                    .global __vector_26
 140                    __vector_26:
 134:rs422coax.c   **** SIGNAL(SIG_OUTPUT_COMPARE3A)
 135:rs422coax.c   **** {
 142                    .LM9:
 143                    /* prologue: frame size=0 */
 144 0034 1F92                  push __zero_reg__
 145 0036 0F92                  push __tmp_reg__
 146 0038 0FB6                  in __tmp_reg__,__SREG__
 147 003a 0F92                  push __tmp_reg__
 148 003c 1124                  clr __zero_reg__
 149 003e 8F93                  push r24
 150                    /* prologue end (size=6) */
 136:rs422coax.c   ****     DPA1(0);   ///<debug
 152                    .LM10:
 153 0040 D998                  cbi 59-0x20,1
 137:rs422coax.c   ****     FLAG_Timer_1s = 1;
 155                    .LM11:
 156 0042 81E0                  ldi r24,lo8(1)
 157 0044 8093 0000             sts FLAG_Timer_1s,r24
 138:rs422coax.c   ****     DPA1(1);   ///<debug
 159                    .LM12:
 160 0048 D99A                  sbi 59-0x20,1
 161                    /* epilogue: frame size=0 */
 162 004a 8F91                  pop r24
 163 004c 0F90                  pop __tmp_reg__
 164 004e 0FBE                  out __SREG__,__tmp_reg__
 165 0050 0F90                  pop __tmp_reg__
 166 0052 1F90                  pop __zero_reg__
 167 0054 1895                  reti
 168                    /* epilogue end (size=6) */
 169                    /* function __vector_26 size 17 (5) */
[/lst]

[lst for SIGNAL(SIG_OUTPUT_COMPARE2)]
173                     __vector_9:
165:rs422coax.c   **** SIGNAL(SIG_OUTPUT_COMPARE2)
 166:rs422coax.c   **** {
 175                    .LM11:
 176                    /* prologue: frame size=0 */
 177 005e 1F92                  push __zero_reg__
 178 0060 0F92                  push __tmp_reg__
 179 0062 0FB6                  in __tmp_reg__,__SREG__
 180 0064 0F92                  push __tmp_reg__
 181 0066 1124                  clr __zero_reg__
 182 0068 2F93                  push r18
 183 006a 3F93                  push r19
 184 006c 4F93                  push r20
 185 006e 8F93                  push r24
 186 0070 9F93                  push r25
 187 0072 EF93                  push r30
 188 0074 FF93                  push r31
 189                    /* prologue end (size=12) */
 167:rs422coax.c   ****   DPA5(0);  ///<debug
 191                    .LM12:
 192 0076 DD98                  cbi 59-0x20,5
 168:rs422coax.c   ****     char Rx;
 169:rs422coax.c   **** //    Rx = COAX_Rx(); // inline macro:
 170:rs422coax.c   ****     register char output=COAXDATAOUT_IN;
 194                    .LM13:
 195 0078 83B3                  in r24,51-0x20
 171:rs422coax.c   ****     Rx=COAXDATAIN_IN;
 197                    .LM14:
 198 007a 23B3                  in r18,51-0x20
 172:rs422coax.c   ****     if (output & COAXDATAOUT_TxD_PIN) Rx >>=1;
 200                    .LM15:
 201 007c 9927                  clr r25
 202 007e 84FD                  sbrc r24,4
 204                    .LM16:
 205 0080 2695                  lsr r18
 206                    .L6:
 173:rs422coax.c   ****     if (output & COAXDATAOUT_DTR_PIN) Rx >>=1;
 208                    .LM17:
 209 0082 85FD                  sbrc r24,5
 210 0084 2695                  lsr r18
 211                    .L7:
 174:rs422coax.c   ****     Rx &= COAXDATAIN_TxD_PIN;
 213                    .LM18:
 214 0086 2170                  andi r18,lo8(1)
 175:rs422coax.c   **** 
 176:rs422coax.c   ****     DPA4(Rx);     ///<debug
 216                    .LM19:
 217 0088 11F0                  breq .L8
 218 008a DC9A                  sbi 59-0x20,4
 219 008c 01C0                  rjmp .L9
 220                    .L8:
 221 008e DC98                  cbi 59-0x20,4
 222                    .L9:
 177:rs422coax.c   ****     if ( (SWU_Rx_state == SWU_state_start) && (Rx) )
return;
 224                    .LM20:
 225 0090 8091 0000             lds r24,SWU_Rx_state
 226 0094 8130                  cpi r24,lo8(1)
 227 0096 19F4                  brne .L10
 228 0098 2223                  tst r18
 229 009a 09F0                  breq .+2
 230 009c A3C0                  rjmp .L5
 231                    .L10:
 178:rs422coax.c   **** 
 179:rs422coax.c   ****     // ... calculation: 
[/lst]



[compiler info]
WinAVR-20050214
(AVR GCC 3.4.3)

Compiling: rs422coax.c
avr-gcc -c -mmcu=atmega128 -I. -gstabs -DF_CPU=16000000UL  -Os
-funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -Wall
-Wstrict-prototypes -Wa,-adhlns=rs422coax.lst  -std=gnu99 -MD -MP -MF
.dep/rs422coax.o.d rs422coax.c -o rs422coax.o


avr-gcc -S -mmcu=atmega128 -I. -gdwarf-2 -DF_CPU=16000000UL  -Os
-funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -Wall
-Wstrict-prototypes -Wa,-adhlns=rs422coax.lst  -std=gnu99 -MD -MP -MF
.dep/rs422coax.s.d rs422coax.c -o rs422coax.s
[/compiler info]







reply via email to

[Prev in Thread] Current Thread [Next in Thread]