[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"

From: David Brown
Subject: Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"
Date: Fri, 9 Dec 2016 10:11:55 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

On 08/12/16 21:46, Georg-Johann Lay wrote:
> Marcin Godlewski schrieb:
>> Dear all,
>> Thanks for the reply to David. However I'm not trying to find a
>> solution for the described issue. What I'm trying to say in this
>> e-mail is that this part of Atmel documentation:
>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>> is innacurate and should be corrected. The conclusion says:
>>     memory barriers ensure proper ordering of volatile accesses
>>     memory barriers don't ensure statements with no volatile accesses
>> to be reordered across the barrier
>> while it should say:
>>     memory barriers ensure proper ordering of global variables accesses
>>     memory barriers don't ensure local variables accesses to be
>> reordered across the barrier
> At least the "local" vs. "global" is not completely correct.  After
> all it's about memory accesses, and it doesn't matter if the memory
> is local (e.g. local static) or if you are dereferencing a pointer
> (which might point to a local auto or to an object on heap).
> The code example you quoted above is actually due to a subtle
> implementation detail of division, modulo and some other arithmetic
> of GCC's avr backend (the division is _not_ a call actually).
> IIRC the solution for me back then was -fno-tree-ter as any messing
> with inline asm doesn't hit the point.

Yes, that is the solution you proposed when we discussed it a good while
back (on the avrlibc list, I think).  I disagree with you somewhat here
(as I did then, though I can't remember if we discussed the details).

Changing an optimisation option like this means that code that looks
right, will run as expected - and that is a good thing.  But it also
means that the code will /only/ be correct if particular optimisation
flags are set in particular ways.  That is a very fragile situation, and
should always be avoided.  To be safe, this optimisation would have to
be completely disabled in the avr-gcc port.  I don't know how useful
this particular optimisation is in terms of generating more efficient
code, though from the gcc manual it appears very useful and is enabled
at -O1.  Clearly that determines the "cost" of this solution to the
re-ordering problem.

The use of the assembly dependency (or a nicely named macro with the
same effect) fixes the problem in this situation.  It does so regardless
of optimisation options - the compiler is required to have calculated
the result of the division before disabling interrupts, and cannot
re-order the operations.  It does so without adding any extra assembly
code or hindering any optimisations - it merely forces an order on
operations that are to be done anyway.

It has the clear disadvantage of needing extra code in the user's
source.  Like memory barriers, it is a way of giving the compiler extra
information that cannot be expressed in normal C, and which the compiler
cannot (at the moment) figure out for itself.

You say that the assembly dependency does not "hit the point".  I think
you are correct there - it is treating the symptom, not the disease.  It
is not telling the compiler that an operation should not be re-ordered,
or that division is a costly operation.  It simply tells the compiler
that we need the results of that computation /here/.  But it is a very
effective and efficient cure for this sort of problem.  Unless and until
there is some /safe/ fix in the compiler to avoid this (and I don't
count "put this compiler option in your command line" as safe), I really
do think it is the best we have.

Note, however, that the "forceDependency" macro only solves half the
problem.  Consider :

unsigned int test2b(void)
        unsigned int val;

        val = ivar;
        val = 65535 / val;
        return val;

In this case, the compiler could move the division backwards above the
sei(), giving a similar problem.  (It did not make the move in my brief
tests - but it /could/ do.)  I don't know if the -fno-tree-ter flag
stops that too, but the forceDependency() macro is not enough.  The
forgetCompilerKnowledge macro is the answer:

unsigned int test2b(void)
        unsigned int val;

        val = ivar;
        asm volatile ("" : "+g" (val));
        val = 65535 / val;
        return val;

This tells the compiler that it needs to stabilise the value of "val",
and it can't assume anything about "val" after this point in the code,
because it /might/ be read and /might/ change in the assembly code.
Again, nothing is actually generated in the assembly and we are only
forcing an ordering on the code.

Nothing would please me better here here than to have the compiler
understand that users would not want such re-ordering around cli() and
sei(), so that the problem simply goes away.  But it should not require
particular choices of compiler flags, nor should it require disabling
useful optimisations and thus generating poorer code elsewhere.

It is also worth noting that though this situation occurs because
division does not work like a normal function call, despite it using a
library call for implementation, there is nothing fundamental to stop
the compiler moving a call to foo() back or forth across a cli() or
sei() as long as the compiler is sure that no memory is accessed, no
volatiles are accessed, and there are no other externally visible
effects in foo().  If the definition of foo() is available when
compiling the code, then the compiler could well know this to be the
case.  If we replace "val = 65535U / val;" with "val = foo(val);", where
we have :

        unsigned int foo(unsigned int v) {
                return (v * v) - v;

in the same compilation unit, some or all of the calculation from foo()
will be inlined and mixed with the cli().  Again, -fno-tree-ter fixes
this - at the cost of killing such mixing and optimisation in cases
where it is useful.  And again, the inline assembly fixes it at the cost
of knowing that you have to add this line of source code.

As gcc gets ever smarter with its inlining, function cloning, link-time
optimisations, etc., then this will become more and more of an issue.

Maybe the answer is that gcc needs an "execution barrier" that is
stronger than a memory barrier, because it will also block calculations
- it would act as a barrier to all local variables.  I cannot think of
any way to make such a barrier with inline assembly or the C11 fences -
I think it would take a new __builtin for gcc.  Such a feature would
have use on all embedded targets, not just AVR.



> Johann
>> I don't know whether this group is the right place to post it however
>> I do not know any better place. Hope someone here can trigger the
>> change of the documentation and I also hope to be corrected if I am
>> wrong.
>> Thanks and regards,
>> Marcin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]