[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] Optimization Hiccup? (Please CC me, I'm not subscribe
Re: [avr-gcc-list] Optimization Hiccup? (Please CC me, I'm not subscribed)
Tue, 21 Oct 2014 08:19:08 +0100
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
I would encourage you to join the list, so you don't miss out on the
I talked about this with Joern Rennecke. First thing is to note that it
is hard to diagnose problems like this without the full source code.
Register allocation is notoriosly hairy. As we mostly care for speed in
GCC, a loop with a non-inlined function call is not high on the list of
priorities in general.
We have done some piecing it together from the stdimt type names and the
assembly listing (we surmise FONT_WIDTH is 6). From trying to
understanding what happens after common sub-expression elimination, loop
invariant motion etc. and eventually register allocation, Joern ends up
with two different solutions. Both will be rather time consuming, and
yet only the first step in formulating a plan to optimize this code.
So we can progress this, could you supply us with the full source code.
You can run gcc with --save-temps, which will save all the intermediate
files. The ones ending with .i are the pre-processed source, which is
what we need.
Rather than posting to this list, you should file it as a bug at the FSF
bug tracker (https://gcc.gnu.org/bugzilla/), including the source as the
attachment. By all means post here to ask about progress with the bug.
On 18/10/14 18:08, Thomas Watson wrote:
> GCC is generating substantially less optimized code than it does if I
> help it along a bit. Code is at
> http://pastie.org/private/awus9tkgdwbzpdwjgrbw and the assembler
> output is at http://pastie.org/private/s4liesmrd9f6fi2wahe0vg . Top
> block is with cx and bottom block is modifying the argument instead
> of copying it to a local. Full compiler invocation is: avr-gcc -c -I.
> -mmcu=atmega328p -std=gnu99 -Os -Wall -DF_CPU=16000000
> -ffunction-sections -fdata-sections -Wl,--gc-sections -o tft.o tft.c
> . I thought that copying x to a local might be wasting a bit of
> space. However, if we modify x directly rather than copying it to a
> local before modification, the compiler decides to store x on the
> stack instead of in a register which takes us on a journey involving
> unnecessary stack access, silly re-copying, and far too much code.
> I figured that if I didn't copy it to a local, I would save code
> space (like I do in many other situations) but something is going
> wrong here. If I have cx, a callee-saved register is reserved for it
> (line 33) and x is copied to cx (56). When we call tft_draw_chr, it
> expects the 'x' parameter in r24, so we copy it there (68) before we
> call. Since r24 might be eaten by tft_draw_chr, we can't use it to
> store x through the call and not have to bother with r17. Anyway,
> once we return, we add FONT_WIDTH to r17 (73) in preparation for the
> next iteration of the loop. In addition, since we can use Y for the
> string pointer, we do not need to worry about it being eaten by
> tft_draw_chr and it is only pushed and popped in the prologue and
> epilogue. All well and dandy, right? In theory, since x is never
> touched before or after cx is assigned, it is essentially an alias. I
> would therefore expect exactly the same code (or perhaps more
> optimized if the architecture and calling convention supported it) to
> be generated.
> However, such is not the case. x is passed into the function in r24,
> but we want to modify it and have it persist through the loop.
> Because r24 could be mangled by calling a function, we obviously must
> move it to elsewhere. Unfortunately, the compiler decides on r25
> (170), a register subject to the same limitation. As before, we must
> move our temporary register to r24 (186) in order to call
> tft_draw_chr, according to calling convention. However, since r25
> could be mangled by the call, we have to save it (187) before the
> function call. The compiler chooses the stack, as opposed to a
> callee-saved register, which has rather broad implications. First, we
> must reserve stack space (159) and copy the stack pointer to Y (162),
> chosen presumably because Y is also callee-saved. But since we used Y
> as the string pointer before, we must store the string pointer
> elsewhere. R8/9 are chosen. As callee-saved registers, we must
> perform an additional two pushes and pops to save them at the
> beginning and end of the function. We also have to move the string
> pointer there (174).
> Okay. So we've returned from tft_draw_chr (192). We must pull x off
> the stack into r25 and add FONT_WIDTH to it (193) in preparation for
> the next iteration. We could have just as easily not used r25 and
> continued to use r24, using the stack to save it as before (but there
> is a better way). We know r24 won't be touched until we call
> tft_draw_chr. Now that that's over, we have to fetch the next
> character in the string, but because Y isn't our string pointer, this
> doesn't go smoothly. We can't load data if the address isn't in X, Y,
> or Z. Since r8/9 is none of those, we have to copy it to Z (198) to
> retrieve the next character and do a post-increment on Z to index the
> next character (199). Since Z isn't callee-saved, it might be mangled
> by a function call, so we must store it back to r8/9 (200). Finally,
> we can test for the next iteration.
> I'm not sure why the second code doesn't end up the same as the
> first. Choosing to use another caller-saved register as our temporary
> register is an extremely poor choice. If for some reason that was
> mandatory, we could (at least in this code) still use r24 to avoid
> having to copy between it and r25. However, instead of using a
> register like r9 to store our temporary register, we use one that
> isn't callee-saved, which means we still end up using r9 (and r8
> too!) in our quest to needlessly use the stack.
> This is probably way too verbose but there must be some useful
> information in there somewhere. Please take a look. Also, CC me on
> any replies because I'm not subscribed to the list yet.
> Thank you all, Thomas
> AVR-GCC-list mailing list
Tel: +44 (1590) 610184
Cell: +44 (7970) 676050
PGP key: 1024D/FB4754E1 2009-03-20