avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Question about code size


From: David Brown
Subject: Re: [avr-gcc-list] Question about code size
Date: Mon, 19 Mar 2007 08:38:07 +0100
User-agent: Thunderbird 1.5.0.10 (Windows/20070221)

Joerg Wunsch wrote:
"Dave Hylands" <address@hidden> wrote:

Yeah - In this particular case, I was just compiling a snippet of C
code. Normally I would dissassemble the fully located elf file.

Just compile it into assembly then.

(loop counting direction)

To me, I don't really care which way it counts (internally).

Not only to you, it doesn't change the code, so it's a valid
optimization.

I was shocked to see a working bootloader grow by 60% when all I did
was go to a newer compiler. A big portion of the increase was due to
the way it was inlining stuff.

There are some known bugs with the inlining (wrong cost calculations),
but when reporting one of these, it also became apparent to me that
GCC has simply no chance to realize that:

void foo(void)
{
  asm ("some" "\n"
       "fairly" "\n"
       "complex" "\n"
       "stuff" "\n"
       "goes" "\n"
       "here");
}

would *not* get smaller code by inlining the calls to foo(): to GCC,
the entire inline asm statement just looks like a single instruction.
As by inlining, the overhead of CALL and RET vanishes, regardless how
many invocations of foo() you've got, it will always calculate that it
gains code size by inlining foo().


There are lots of other potential benefits from inlining functions, beyond just cutting a CALL and RET (which are not insubstantial time overheads). Inlined functions can often benefit from knowledge of their parameters (constant propagation, figuring out branches at compile time, etc.), and combining register usage with the caller function can give significantly smaller code. I often specifically declare small functions as "static inline", even if they are called more than once, because it can lead to smaller code.

So (otherwise short) functions with large inline assembly statements
are really good candidates for being declared with
__attribute__((noinline)).


Yes, gcc is pretty good at optimising despite hand-written assembly, but it's not perfect.

The reason why you see that change with a new compiler is that GCC 4
obviously tries to make a detailed cost analysis about inlining
already at optimization level -Os, while GCC 3.x only considered
automatic inlining when optimizing for speed (-O3).

The second reason for a code increase caused by inlining is if you've
got functions that are only used internally, then legitimately inlined
by the compiler, but are not declared "static".  The compiler is then
forced to keep one physical copy of that function in case someone
declares that function as "extern" from another module, and wants to
reference it from there.


Another good reason for using "static" is that it is important for good structured programming. It is one of C's failings that functions and data are not "static" by default - it is too flexible in this regard. A good way to enforce structure is that all functions and data in a C file are either global, with a corresponding "extern" in a matching header file, or they are "static". You get better object code, clearer organisation of the source code, and safer re-use of code since private implementation functions and data do not collide with the global namespace. Use the "-Wmissing-declarations" and "-Wmissing-prototypes" flags to enforce such a policy.

The failure to optimize out the int promotions i small potatoes in
comparison to the inlining decision.

You've still got a completely wrong conclusion here: there was *no*
resulting int promotion in your assembly snippet.  The only missed
optimization was to roll the loop the other way around (counting down,
and checking for 0), which it did do in version 3.x.  This caused a
second register to be pushed and popped, as well as a more complicated
end-of-loop comparision.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]