[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"

From: David Brown
Subject: Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"
Date: Thu, 9 Feb 2017 19:13:28 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0

The functions in atomic.h are correct, but their description is not quite accurate. The description says that inside an atomic block, the code cannot be interrupted. But as we have seen, /code/ can be moved inside and outside of an atomic block just as it can be moved around the "cli()" instruction (the atomic.h macros generate the same instructions). The blocks are atomic with respect to memory access - any access to memory within an atomic block will be completed without interruption.

Note that C, even with gcc extensions and inline assembly, has no way to express "do this code here" - you can have barriers to movement of memory accesses, but not instruction execution. Note also that it is only control of the memory access that is needed for code correctness - moving instruction execution affects timing, but not the results.



On 09/02/17 18:35, Bob Paddock wrote:
Are the functions like macros in atomic.h correct?
They attempt to deal properly with critical sections/code motion etc,
in what this thread is discussing.


On Thu, Feb 9, 2017 at 9:14 AM, David Brown <address@hidden> wrote:
You could file this as a bug on the website:

As far as I understand it, the documentation (both on the website and
the Atmel documentation) is generated directly from the library code and
comments - so this would be a change to the library source.

Feel free to quote any parts of my mails on the subject when filing the bug.



On 09/02/17 13:11, Marcin Godlewski wrote:
Dear All,

The site
still contains buggy description of memory barriers in avr-gcc. As
this site is popular among avr users I think it's really worth
fixing. What is more the same inaccurate article is available on
Atmel doc site:
. Is there anybody subscribed to this mailing list who can contact
the authors/maintainers of the site in order to discuss correction of
the content?

Marcin Godlewski

W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
<address@hidden> napisał:
W dniu 2016-12-09 10:11:55 użytkownik David Brown
<address@hidden> napisał:
On 08/12/16 21:46, Georg-Johann Lay wrote:
Marcin Godlewski schrieb:
Dear all,

Thanks for the reply to David. However I'm not trying to find
a solution for the described issue. What I'm trying to say in
this e-mail is that this part of Atmel documentation:

is innacurate and should be corrected. The conclusion says:

memory barriers ensure proper ordering of volatile accesses

memory barriers don't ensure statements with no volatile
accesses to be reordered across the barrier while it should

memory barriers ensure proper ordering of global variables

memory barriers don't ensure local variables accesses to be
reordered across the barrier

At least the "local" vs. "global" is not completely correct.
After all it's about memory accesses, and it doesn't matter if
the memory is local (e.g. local static) or if you are
dereferencing a pointer (which might point to a local auto or
to an object on heap).

The code example you quoted above is actually due to a subtle
implementation detail of division, modulo and some other
arithmetic of GCC's avr backend (the division is _not_ a call

IIRC the solution for me back then was -fno-tree-ter as any
messing with inline asm doesn't hit the point.

Yes, that is the solution you proposed when we discussed it a
good while back (on the avrlibc list, I think).  I disagree with
you somewhat here (as I did then, though I can't remember if we
discussed the details).

Changing an optimisation option like this means that code that
looks right, will run as expected - and that is a good thing.
But it also means that the code will /only/ be correct if
particular optimisation flags are set in particular ways.  That
is a very fragile situation, and should always be avoided.  To be
safe, this optimisation would have to be completely disabled in
the avr-gcc port.  I don't know how useful this particular
optimisation is in terms of generating more efficient code,
though from the gcc manual it appears very useful and is enabled
at -O1.  Clearly that determines the "cost" of this solution to
the re-ordering problem.

The use of the assembly dependency (or a nicely named macro with
the same effect) fixes the problem in this situation.  It does so
regardless of optimisation options - the compiler is required to
have calculated the result of the division before disabling
interrupts, and cannot re-order the operations.  It does so
without adding any extra assembly code or hindering any
optimisations - it merely forces an order on operations that are
to be done anyway.

It has the clear disadvantage of needing extra code in the
user's source.  Like memory barriers, it is a way of giving the
compiler extra information that cannot be expressed in normal C,
and which the compiler cannot (at the moment) figure out for

You say that the assembly dependency does not "hit the point".  I
think you are correct there - it is treating the symptom, not the
disease.  It is not telling the compiler that an operation should
not be re-ordered, or that division is a costly operation.  It
simply tells the compiler that we need the results of that
computation /here/.  But it is a very effective and efficient
cure for this sort of problem.  Unless and until there is some
/safe/ fix in the compiler to avoid this (and I don't count "put
this compiler option in your command line" as safe), I really do
think it is the best we have.

Note, however, that the "forceDependency" macro only solves half
the problem.  Consider :

unsigned int test2b(void) { unsigned int val;

cli(); val = ivar; sei(); val = 65535 / val; return val; }

In this case, the compiler could move the division backwards
above the sei(), giving a similar problem.  (It did not make the
move in my brief tests - but it /could/ do.)  I don't know if the
-fno-tree-ter flag stops that too, but the forceDependency()
macro is not enough.  The forgetCompilerKnowledge macro is the

unsigned int test2b(void) { unsigned int val;

cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
65535 / val; return val; }

This tells the compiler that it needs to stabilise the value of
"val", and it can't assume anything about "val" after this point
in the code, because it /might/ be read and /might/ change in the
assembly code. Again, nothing is actually generated in the
assembly and we are only forcing an ordering on the code.

Nothing would please me better here here than to have the
compiler understand that users would not want such re-ordering
around cli() and sei(), so that the problem simply goes away.
But it should not require particular choices of compiler flags,
nor should it require disabling useful optimisations and thus
generating poorer code elsewhere.

It is also worth noting that though this situation occurs
because division does not work like a normal function call,
despite it using a library call for implementation, there is
nothing fundamental to stop the compiler moving a call to foo()
back or forth across a cli() or sei() as long as the compiler is
sure that no memory is accessed, no volatiles are accessed, and
there are no other externally visible effects in foo().  If the
definition of foo() is available when compiling the code, then
the compiler could well know this to be the case.  If we replace
"val = 65535U / val;" with "val = foo(val);", where we have :

unsigned int foo(unsigned int v) { return (v * v) - v; }

in the same compilation unit, some or all of the calculation from
foo() will be inlined and mixed with the cli().  Again,
-fno-tree-ter fixes this - at the cost of killing such mixing and
optimisation in cases where it is useful.  And again, the inline
assembly fixes it at the cost of knowing that you have to add
this line of source code.

As gcc gets ever smarter with its inlining, function cloning,
link-time optimisations, etc., then this will become more and
more of an issue.

Maybe the answer is that gcc needs an "execution barrier" that
is stronger than a memory barrier, because it will also block
calculations - it would act as a barrier to all local variables.
I cannot think of any way to make such a barrier with inline
assembly or the C11 fences - I think it would take a new
__builtin for gcc.  Such a feature would have use on all embedded
targets, not just AVR.



I totally agree with you - a feature like "execution barrier" would
be very useful. C11 made good job standardizing multi-threading
features but unfortunately the features not always fits firmware
development. Controlling what exactly goes into critical section is
a fundamental problem, so I would even go further - why don't you
propose the "execution barrier" as a new feature for the future C
language standard?


I don't know whether this group is the right place to post it
however I do not know any better place. Hope someone here can
trigger the change of the documentation and I also hope to be
corrected if I am wrong.

Thanks and regards, Marcin

AVR-GCC-list mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]