[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Speeding [u]ltoa by 35%

From: Georg-Johann Lay
Subject: Re: [avr-gcc-list] Speeding [u]ltoa by 35%
Date: Sun, 15 Jul 2012 13:47:35 +0200
User-agent: Thunderbird (Windows/20100228)

Weddington, Eric schrieb:

Concerning sprintf: what's the avr-libc policy here?

In general, our users will complain if code size increases, even by a
trivial amount. It is only rarely (or never) that I hear a complaint
that the code is not fast enough. And even then, they typically talk
about speed of executing an ISR, not mainline code.

Because of that, IMHO, size of the code is much more critical than

The same optimization can be done with itoa and utoa.

For itoa, the size increases from 66 to 68 bytes.
Notice that the new version does not need __udivmodhi4
so that the new version might me considerably smaller in the end.

If no MOVW is available the new version has the same size
like the old itoa body.

#define ZERO r1

.macro DEFUN name
.section .text.avrlibc.\name, "ax", @progbits
.global \name
.func \name

.macro ENDF name
.size \name, . - \name

#define VAL     24
#define STR     22
#define RADIX   20
#define DIGIT   18
#define BITS    19
#define SIGN    21

DEFUN __itoa_asm
    movw    r30,    STR

    clr     SIGN
    ;; Output a sign iff  VAL < 0  and  RADIX == 10
    ;; Notice that the C part already filtered out invalid radices.
    cpi     RADIX,  10
    brne 1f
    tst     VAL+1
    brpl 1f
    ;; If VAL < 0 and RADIX == 10 then VAL = |VAL|
    neg     VAL+1
    neg     VAL
    sbc     VAL+1,  ZERO
    ;; and store SIGN for later use.
    ldi     SIGN,   '-'

1:  ;; Pop one digit from VAL.
    ;; This is vanilla unsigned 16:8 division with the additional
    ;; knowledge that the high bit of RADIX is zero:
    ;; VAL   <- VAL / RADIX
    ;; DIGIT <- VAL % RADIX
    clr     DIGIT
    ldi     BITS,   16

2:  lsl     VAL
    rol     VAL+1
    rol     DIGIT
    cp      DIGIT,  RADIX
    brlo 3f
    inc     VAL
    sub     DIGIT,  RADIX
3:  dec     BITS
    brne 2b

    ;; Map DIGIT to its RADIX-adic representation
    subi    DIGIT,  -'0'
    cpi     DIGIT,  '9'+1
    brlo 4f
    subi    DIGIT,  '0'-'a'+10

4:  ;; And store it to the reversed string
    st      Z+,     DIGIT

    ;; Iterate until all digits are sucked out of VAL.
    sbiw    VAL,    0
    brne 1b

    ;; Done with the digits: Output SIGN stored above.
    cpse    SIGN,   ZERO
    st      Z+,     BITS

    ;; Finish string with '\0'
    st      Z,      ZERO

    ;; And reverse it.
    movw    24,     STR
    jmp     strrev

ENDF __itoa_asm

#undef VAL
#undef STR
#undef RADIX
#undef DIGIT
#undef BITS
#undef SIGN

The C part is similar to the long version:

static __inline__ __attribute__((__always_inline__))
char* itoa (int x, char *str, int radix)
    if (radix < 2 || radix > 36)
        *str = '\0';
        return str;
        extern char* __itoa_asm (int, char*, unsigned char);
        return __itoa_asm (x, str, (unsigned char) radix);

And similar for utoa.

The C part could even be more explicit and express that
R26/R27 are not clobbered by the assembler part.

Having said that, though, I'm not against having additional,
alternative functions that optimize speed, as long as the default is
for size.

Would should these additional routines be supplied?
Making -Os multilib option?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]