Re: [avr-gcc-list] Speeding [u]ltoa by 35%
Georg-Johann Lay |
Re: [avr-gcc-list] Speeding [u]ltoa by 35% |
Sun, 15 Jul 2012 13:47:35 +0200 |
Thunderbird 2.0.0.24 (Windows/20100228) |
Weddington, Eric schrieb:
Concerning sprintf: what's the avr-libc policy here?
In general, our users will complain if code size increases, even by a
trivial amount. It is only rarely (or never) that I hear a complaint
that the code is not fast enough. And even then, they typically talk
about speed of executing an ISR, not mainline code.
Because of that, IMHO, size of the code is much more critical than
speed.
The same optimization can be done with itoa and utoa.
For itoa, the size increases from 66 to 68 bytes.
Notice that the new version does not need __udivmodhi4
so that the new version might me considerably smaller in the end.
If no MOVW is available the new version has the same size
like the old itoa body.
#define ZERO r1
.macro DEFUN name
.section .text.avrlibc.\name, "ax", @progbits
.global \name
.func \name
\name:
.endm
.macro ENDF name
.size \name, . - \name
.endfunc
.endm
#define VAL 24
#define STR 22
#define RADIX 20
#define DIGIT 18
#define BITS 19
#define SIGN 21
DEFUN __itoa_asm
movw r30, STR
clr SIGN
;; Output a sign iff VAL < 0 and RADIX == 10
;; Notice that the C part already filtered out invalid radices.
cpi RADIX, 10
brne 1f
tst VAL+1
brpl 1f
;; If VAL < 0 and RADIX == 10 then VAL = |VAL|
neg VAL+1
neg VAL
sbc VAL+1, ZERO
;; and store SIGN for later use.
ldi SIGN, '-'
1: ;; Pop one digit from VAL.
;; This is vanilla unsigned 16:8 division with the additional
;; knowledge that the high bit of RADIX is zero:
;; VAL <- VAL / RADIX
;; DIGIT <- VAL % RADIX
clr DIGIT
ldi BITS, 16
2: lsl VAL
rol VAL+1
rol DIGIT
cp DIGIT, RADIX
brlo 3f
inc VAL
sub DIGIT, RADIX
3: dec BITS
brne 2b
;; Map DIGIT to its RADIX-adic representation
subi DIGIT, -'0'
cpi DIGIT, '9'+1
brlo 4f
subi DIGIT, '0'-'a'+10
4: ;; And store it to the reversed string
st Z+, DIGIT
;; Iterate until all digits are sucked out of VAL.
sbiw VAL, 0
brne 1b
;; Done with the digits: Output SIGN stored above.
cpse SIGN, ZERO
st Z+, BITS
;; Finish string with '\0'
st Z, ZERO
;; And reverse it.
movw 24, STR
jmp strrev
ENDF __itoa_asm
#undef VAL
#undef STR
#undef RADIX
#undef DIGIT
#undef BITS
#undef SIGN
The C part is similar to the long version:
static __inline__ __attribute__((__always_inline__))
char* itoa (int x, char *str, int radix)
{
if (radix < 2 || radix > 36)
{
*str = '\0';
return str;
}
else
{
extern char* __itoa_asm (int, char*, unsigned char);
return __itoa_asm (x, str, (unsigned char) radix);
}
}
And similar for utoa.
The C part could even be more explicit and express that
R26/R27 are not clobbered by the assembler part.
Having said that, though, I'm not against having additional,
alternative functions that optimize speed, as long as the default is
for size.
Would should these additional routines be supplied?
Making -Os multilib option?
Johann