avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] Re: sprintf


From: David Brown
Subject: [avr-gcc-list] Re: sprintf
Date: Fri, 06 Mar 2009 04:04:40 +0100
User-agent: Thunderbird 2.0.0.19 (Windows/20081209)

Joerg Wunsch wrote:
David Brown <address@hidden> wrote:

However, the compiler has a much better chance of doing bounds checking, alias checking, and other optimisations on the array expression.

Why?  Either the compiler knows at compile-time that "foo" is actually
an array it has detailed information about, then it can perform
bounds-checking etc. on it, regardless which of both expressions
you're writing (assuming it performs any kind of bounds checking at
all -- I'm not sure GCC does).  However, if "foo" is a pointer of
unknown source, no checks can be applied regardless of whether you're
writing &foo[i] or foo + i.  (Some optimizations can be applied if foo
is a pointer passed as a function argument, and qualified "restrict".)

Additionally, &foo[i] and (foo + i) read as different things (one is
the address of an element, the other is the address of an array plus
an offset).

However, given the way an offset is computed in C, the result is
again: the address of an element (possibly an element of a dynamic
array).


Rather than waffling more about what I thought or expected to happen, I tried out a couple of cases. I've pasted my test code at the bottom of this post. I compiled this with avr-gcc 4.2.2 and 4.3.2 (an earlier WinAVR, and the latest stable WinAVR), using -Os -Wall flags.

With avr-gcc 4.3.2, test3() below gave a "warning: array subscript is above array bounds" warning. When taking the address of an array element or using pointer arithmetic, no warning is given. This makes sense as far as I understand the C standards - attempting to access an array beyond its bounds is illegal, but adding an integer to a pointer is always legal despite its values. avr-gcc 4.2.2 gave no such warning, so here there has been a definite improvement.

Interesting things happen with the "foo" functions below, which are all functionally identical. With avr-4.3.2, foo1 (using array access) generates an unrolled loop:

        lds r24, as+1
        sts bs, r24
        sts bs+1, r24
        sts bs+2, r24
        sts bs+3, r24
        ret

That is fast code, but at 11 words is not size-optimal.

foo2, foo3 and foo4 (using various pointer expressions) all generate:

        lds r24, as+1
        ldi r30, lo8(bs)
        ldi r31, hi8(bs)
    .L22:
        st Z+, r24
        ldi r25, hi8(bs + 4)
        cpi r30, lo8(bs + 4)
        cpc r31, r25
        brne .L22
        ret

At 10 words this is smaller, but a lot slower (the "ldi r25" could be hoisted to before the loop).


With avr-gcc 4.2.2, foo1, foo2 and foo3 all generated the unrolled loop, while foo4 generated the full loop above.


I'm not sure which is technically "correct", given that -Os should optimise for size - the full loop has only 10 words but is much slower. But this certainly shows that the array accesses and pointer arithmetic are treated somewhat differently.


If the size of the loops in fooX() is changed to 3, both compilers generate unrolled loops in each case (then it is only 9 words long, and thus definitely better than the full loop). With the size changed to 5, both compilers generate unrolled code for foo1 (now 13 words long) but looped code for foo2, foo3 and foo4.


Neither compiler produced the smarter possibility:

        lds r24, as+1
        ldi r30, lo8(bs)
        ldi r31, hi8(bs)
        st Z+, r24
        st Z+, r24
        st Z+, r24
        st Z+, r24
        ret

This is both smaller (9 words) and faster than any of the generated versions.


This confirms my beliefs that for at least some cases, and for at least avr-gcc 4.3.2, the compiler *can* do better checking with array syntax than pointer arithmetic syntax. It can also sometimes generate better code. It has been traditional with weaker compilers to "hand-optimise" C code that would be written naturally as foo1() using array access, into pointer-heavy code with explicitly cached data as foo4(). It is good to see that such code-massaging is certainly not needed by avr-gcc to generate good code, and in fact it can be counter-productive.


mvh.,

David




#include <stdint.h>

static uint8_t as[4];
static uint8_t bs[4];

uint8_t test1(void) {
        return as[1];
}

uint8_t test2(void) {
        return *(&(as[4]));
}

uint8_t test2b(void) {
        return *(&(as[5]));
}

uint8_t test3(void) {
        return as[5];
}

uint8_t test4(void) {
        return *(as + 1);
}

uint8_t test5(void) {
        return *(as + 4);
}

uint8_t test6(void) {
        return *(as + 5);
}

void foo1(void) {
        for (uint8_t i = 0; i < 4; i++) {
                bs[i] = as[1];
        }
}

void foo2(void) {
        for (uint8_t i = 0; i < 4; i++) {
                *(bs + i) = *(as + 1);
        }
}

void foo3(void) {
        uint8_t *pa = as;
        uint8_t *pb = bs;
        for (uint8_t i = 0; i < 4; i++) {
                *(pb + i) = *(pa + 1);
        }
}

void foo4(void) {
        uint8_t *pb = bs;
        uint8_t a = as[1];
        for (uint8_t i = 0; i < 4; i++) {
                *pb++ = a;
        }
}






reply via email to

[Prev in Thread] Current Thread [Next in Thread]