[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[avr-gcc-list] gcc signal overhead, redundant code, bug (?), far from op
From: |
Szikra Istvan |
Subject: |
[avr-gcc-list] gcc signal overhead, redundant code, bug (?), far from optimal |
Date: |
Thu, 18 Aug 2005 16:05:25 +0200 |
I have problem with avr-gcc generated "long-winded" assembly code.
What I'm trying to write is time critical, so these unnecessary instructions
do matter. (it it also large, so I don't intend to write everything in asm)
I have some signal, one of them is this:
[c source]
SIGNAL(SIG_OUTPUT_COMPARE3A)
{
DPA1(0); ///<debug
FLAG_Timer_1s = 1;
DPA1(1); ///<debug
}
[/c source]
The compiled code (.s) looks like this:
[asm]
.global __vector_26
.type __vector_26, @function
__vector_26:
.LFB46:
.LM9:
/* prologue: frame size=0 */
push __zero_reg__
push __tmp_reg__
in __tmp_reg__,__SREG__
push __tmp_reg__
clr __zero_reg__
push r24
/* prologue end (size=6) */
.LM10:
cbi 59-0x20,1
.LM11:
ldi r24,lo8(1)
sts FLAG_Timer_1s,r24
.LM12:
sbi 59-0x20,1
/* epilogue: frame size=0 */
pop r24
pop __tmp_reg__
out __SREG__,__tmp_reg__
pop __tmp_reg__
pop __zero_reg__
reti
/* epilogue end (size=6) */
/* function __vector_26 size 17 (5) */
[/asm]
This is awfully lot code.
First of all, why clear __zero_reg__ ? It is NOT used in the interrupt.
Hell why push, clr, pop, isn't supposed zero be in it anyway? So I'm not
sure this push, clr, pop is needed in any interrupt, but im sure it is not
in this.
Am I missing something?
The second, in this case why save SREG? Which instruction uses it?
If I'm reading the 'AVR Instruction Set' pdf right the cbi, ldi, sts, sbi
don't modify it.
---- Oh, sure the unwanted clr in the prologue modifies it :( sorry
I think this would do just fine:
__vector_26:
push r24
cbi 59-0x20,1
ldi r24,lo8(1)
sts FLAG_Timer_1s,r24
sbi 59-0x20,1
pop r24
reti
/* prologue size=1, epilogue size=2, function __vector_26 size 8 (5) */
Am I wrong?
I read something similar mail in the archive about this, my question:
Can I somehow modify the prologue without writing everything in asm?
My other problem is in other signals. The gcc generated signal handler
pushes all the used registers in the beginning.
I have a signal that rarely does much calculation so it needs a lot of
registers, but usually just does a simple checking, and uses only 1 or 2
register(s). This signal is called very often (in the checking state) so it
needs to be fast. (when it does the calculation it is called less times.)
Most of the time it just do the checking, so it needs to be as fast as
possible, in order to the main program and other signals could get cpu time.
So, how can I write the signal handler in a way that the simple checking
wouldn't need to wait for the pushes (and pops) of registers which only the
calculation part uses?
Here is a part of the signal:
[c source]
SIGNAL(SIG_OUTPUT_COMPARE2)
{
DPA5(0); ///<debug
char Rx;
// Rx = COAX_Rx(); //inline function/macro:
register char output=COAXDATAOUT_IN;
Rx=COAXDATAIN_IN;
if (output & COAXDATAOUT_TxD_PIN) Rx >>=1;
if (output & COAXDATAOUT_DTR_PIN) Rx >>=1;
Rx &= COAXDATAIN_TxD_PIN;
DPA4(Rx); ///<debug
if ( (SWU_Rx_state == SWU_state_start) && (Rx) ) return;
// ... calculation
}
[/c source]
It compiles to this:
[lss]
SIGNAL(SIG_OUTPUT_COMPARE2)
{
128: 1f 92 push r1
12a: 0f 92 push r0
12c: 0f b6 in r0, 0x3f ; 63
12e: 0f 92 push r0
130: 11 24 eor r1, r1
132: 2f 93 push r18
134: 3f 93 push r19
136: 4f 93 push r20
138: 8f 93 push r24
13a: 9f 93 push r25
13c: ef 93 push r30
13e: ff 93 push r31
DPA5(0); ///<debug
140: dd 98 cbi 0x1b, 5 ; 27
register char output=COAXDATAOUT_IN;
142: 83 b3 in r24, 0x13 ; 19
char Rx=COAXDATAIN_IN;
144: 23 b3 in r18, 0x13 ; 19
if (output & COAXDATAOUT_TxD_PIN) Rx >>=1;
146: 99 27 eor r25, r25 <------- WHY GOD, WHY
?????
148: 84 fd sbrc r24, 4
14a: 26 95 lsr r18
if (output & COAXDATAOUT_DTR_PIN) Rx >>=1;
14c: 85 fd sbrc r24, 5
14e: 26 95 lsr r18
Rx &= COAXDATAIN_TxD_PIN;
150: 21 70 andi r18, 0x01 ; 1
D PA4(Rx); ///<debug
152: 11 f0 breq .+4 ; 0x158
154: dc 9a sbi 0x1b, 4 ; 27
156: 01 c0 rjmp .+2 ; 0x15a
158: dc 98 cbi 0x1b, 4 ; 27
if ( (SWU_Rx_state == SWU_state_start) && (Rx) ) return;
15a: 80 91 62 09 lds r24, 0x0962
15e: 81 30 cpi r24, 0x01 ; 1
160: 19 f4 brne .+6 ; 0x168 (calc)
162: 22 23 and r18, r18
164: 09 f0 breq .+2 ; 0x168 (calc)
166: a3 c0 rjmp .+326 ; 0x2ae (return)
// ... calculation:
168: ...
...
2ae: ff 91 pop r31
2b0: ef 91 pop r30
2b2: 9f 91 pop r25
2b4: 8f 91 pop r24
2b6: 4f 91 pop r20
2b8: 3f 91 pop r19
2ba: 2f 91 pop r18
2bc: 0f 90 pop r0
2be: 0f be out 0x3f, r0 ; 63
2c0: 0f 90 pop r0
2c2: 1f 90 pop r1
2c4: 18 95 reti
So there are a lot of pushes and pops, although only r18 and r24 (,r25) are
used most of the times. This leaves less cpu time for the other tasks.
An other strange thing is the "eor r25,r25". (It is equal to a nop)
Can someone explain it to me?
Any suggestions, workarounds?
Thx,
Istvan Szikra
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
[lst for SIGNAL(SIG_OUTPUT_COMPARE3A)]
136 .Lscope0:
138 .global __vector_26
140 __vector_26:
134:rs422coax.c **** SIGNAL(SIG_OUTPUT_COMPARE3A)
135:rs422coax.c **** {
142 .LM9:
143 /* prologue: frame size=0 */
144 0034 1F92 push __zero_reg__
145 0036 0F92 push __tmp_reg__
146 0038 0FB6 in __tmp_reg__,__SREG__
147 003a 0F92 push __tmp_reg__
148 003c 1124 clr __zero_reg__
149 003e 8F93 push r24
150 /* prologue end (size=6) */
136:rs422coax.c **** DPA1(0); ///<debug
152 .LM10:
153 0040 D998 cbi 59-0x20,1
137:rs422coax.c **** FLAG_Timer_1s = 1;
155 .LM11:
156 0042 81E0 ldi r24,lo8(1)
157 0044 8093 0000 sts FLAG_Timer_1s,r24
138:rs422coax.c **** DPA1(1); ///<debug
159 .LM12:
160 0048 D99A sbi 59-0x20,1
161 /* epilogue: frame size=0 */
162 004a 8F91 pop r24
163 004c 0F90 pop __tmp_reg__
164 004e 0FBE out __SREG__,__tmp_reg__
165 0050 0F90 pop __tmp_reg__
166 0052 1F90 pop __zero_reg__
167 0054 1895 reti
168 /* epilogue end (size=6) */
169 /* function __vector_26 size 17 (5) */
[/lst]
[lst for SIGNAL(SIG_OUTPUT_COMPARE2)]
173 __vector_9:
165:rs422coax.c **** SIGNAL(SIG_OUTPUT_COMPARE2)
166:rs422coax.c **** {
175 .LM11:
176 /* prologue: frame size=0 */
177 005e 1F92 push __zero_reg__
178 0060 0F92 push __tmp_reg__
179 0062 0FB6 in __tmp_reg__,__SREG__
180 0064 0F92 push __tmp_reg__
181 0066 1124 clr __zero_reg__
182 0068 2F93 push r18
183 006a 3F93 push r19
184 006c 4F93 push r20
185 006e 8F93 push r24
186 0070 9F93 push r25
187 0072 EF93 push r30
188 0074 FF93 push r31
189 /* prologue end (size=12) */
167:rs422coax.c **** DPA5(0); ///<debug
191 .LM12:
192 0076 DD98 cbi 59-0x20,5
168:rs422coax.c **** char Rx;
169:rs422coax.c **** // Rx = COAX_Rx(); // inline macro:
170:rs422coax.c **** register char output=COAXDATAOUT_IN;
194 .LM13:
195 0078 83B3 in r24,51-0x20
171:rs422coax.c **** Rx=COAXDATAIN_IN;
197 .LM14:
198 007a 23B3 in r18,51-0x20
172:rs422coax.c **** if (output & COAXDATAOUT_TxD_PIN) Rx >>=1;
200 .LM15:
201 007c 9927 clr r25
202 007e 84FD sbrc r24,4
204 .LM16:
205 0080 2695 lsr r18
206 .L6:
173:rs422coax.c **** if (output & COAXDATAOUT_DTR_PIN) Rx >>=1;
208 .LM17:
209 0082 85FD sbrc r24,5
210 0084 2695 lsr r18
211 .L7:
174:rs422coax.c **** Rx &= COAXDATAIN_TxD_PIN;
213 .LM18:
214 0086 2170 andi r18,lo8(1)
175:rs422coax.c ****
176:rs422coax.c **** DPA4(Rx); ///<debug
216 .LM19:
217 0088 11F0 breq .L8
218 008a DC9A sbi 59-0x20,4
219 008c 01C0 rjmp .L9
220 .L8:
221 008e DC98 cbi 59-0x20,4
222 .L9:
177:rs422coax.c **** if ( (SWU_Rx_state == SWU_state_start) && (Rx) )
return;
224 .LM20:
225 0090 8091 0000 lds r24,SWU_Rx_state
226 0094 8130 cpi r24,lo8(1)
227 0096 19F4 brne .L10
228 0098 2223 tst r18
229 009a 09F0 breq .+2
230 009c A3C0 rjmp .L5
231 .L10:
178:rs422coax.c ****
179:rs422coax.c **** // ... calculation:
[/lst]
[compiler info]
WinAVR-20050214
(AVR GCC 3.4.3)
Compiling: rs422coax.c
avr-gcc -c -mmcu=atmega128 -I. -gstabs -DF_CPU=16000000UL -Os
-funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -Wall
-Wstrict-prototypes -Wa,-adhlns=rs422coax.lst -std=gnu99 -MD -MP -MF
.dep/rs422coax.o.d rs422coax.c -o rs422coax.o
avr-gcc -S -mmcu=atmega128 -I. -gdwarf-2 -DF_CPU=16000000UL -Os
-funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -Wall
-Wstrict-prototypes -Wa,-adhlns=rs422coax.lst -std=gnu99 -MD -MP -MF
.dep/rs422coax.s.d rs422coax.c -o rs422coax.s
[/compiler info]
- [avr-gcc-list] gcc signal overhead, redundant code, bug (?), far from optimal,
Szikra Istvan <=
- RE: [avr-gcc-list] gcc signal overhead, redundant code, bug (?), far from optimal, Szikra Istvan, 2005/08/18
- Re: [avr-gcc-list] gcc signal overhead, redundant code, bug (?), far from optimal, David Kelly, 2005/08/18
- Re: [avr-gcc-list] gcc signal overhead, redundant code, bug (?), far from optimal, David Brown, 2005/08/19