avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"


From: Marcin Godlewski
Subject: Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"
Date: Thu, 09 Feb 2017 13:11:12 +0100

Dear All,

The site 
http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
 still contains buggy description of memory barriers in avr-gcc. As this site 
is popular among avr users I think it's really worth fixing. What is more the 
same inaccurate article is available on Atmel doc site: 
http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
 . Is there anybody subscribed to this mailing list who can contact the 
authors/maintainers of the site in order to discuss correction of the content?

Marcin Godlewski

W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski <address@hidden> napisał:
> W dniu 2016-12-09 10:11:55 użytkownik David Brown <address@hidden> napisał:
> > On 08/12/16 21:46, Georg-Johann Lay wrote:
> > > Marcin Godlewski schrieb:
> > >> Dear all,
> > >>
> > >> Thanks for the reply to David. However I'm not trying to find a
> > >> solution for the described issue. What I'm trying to say in this
> > >> e-mail is that this part of Atmel documentation:
> > >> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> > >> is innacurate and should be corrected. The conclusion says:
> > >>
> > >>     memory barriers ensure proper ordering of volatile accesses
> > >>
> > >>     memory barriers don't ensure statements with no volatile accesses
> > >> to be reordered across the barrier
> > >> while it should say:
> > >>
> > >>     memory barriers ensure proper ordering of global variables accesses
> > >>
> > >>     memory barriers don't ensure local variables accesses to be
> > >> reordered across the barrier
> > > 
> > > At least the "local" vs. "global" is not completely correct.  After
> > > all it's about memory accesses, and it doesn't matter if the memory
> > > is local (e.g. local static) or if you are dereferencing a pointer
> > > (which might point to a local auto or to an object on heap).
> > > 
> > > The code example you quoted above is actually due to a subtle
> > > implementation detail of division, modulo and some other arithmetic
> > > of GCC's avr backend (the division is _not_ a call actually).
> > > 
> > > IIRC the solution for me back then was -fno-tree-ter as any messing
> > > with inline asm doesn't hit the point.
> > 
> > Yes, that is the solution you proposed when we discussed it a good while
> > back (on the avrlibc list, I think).  I disagree with you somewhat here
> > (as I did then, though I can't remember if we discussed the details).
> > 
> > Changing an optimisation option like this means that code that looks
> > right, will run as expected - and that is a good thing.  But it also
> > means that the code will /only/ be correct if particular optimisation
> > flags are set in particular ways.  That is a very fragile situation, and
> > should always be avoided.  To be safe, this optimisation would have to
> > be completely disabled in the avr-gcc port.  I don't know how useful
> > this particular optimisation is in terms of generating more efficient
> > code, though from the gcc manual it appears very useful and is enabled
> > at -O1.  Clearly that determines the "cost" of this solution to the
> > re-ordering problem.
> > 
> > 
> > The use of the assembly dependency (or a nicely named macro with the
> > same effect) fixes the problem in this situation.  It does so regardless
> > of optimisation options - the compiler is required to have calculated
> > the result of the division before disabling interrupts, and cannot
> > re-order the operations.  It does so without adding any extra assembly
> > code or hindering any optimisations - it merely forces an order on
> > operations that are to be done anyway.
> > 
> > It has the clear disadvantage of needing extra code in the user's
> > source.  Like memory barriers, it is a way of giving the compiler extra
> > information that cannot be expressed in normal C, and which the compiler
> > cannot (at the moment) figure out for itself.
> > 
> > You say that the assembly dependency does not "hit the point".  I think
> > you are correct there - it is treating the symptom, not the disease.  It
> > is not telling the compiler that an operation should not be re-ordered,
> > or that division is a costly operation.  It simply tells the compiler
> > that we need the results of that computation /here/.  But it is a very
> > effective and efficient cure for this sort of problem.  Unless and until
> > there is some /safe/ fix in the compiler to avoid this (and I don't
> > count "put this compiler option in your command line" as safe), I really
> > do think it is the best we have.
> > 
> > 
> > Note, however, that the "forceDependency" macro only solves half the
> > problem.  Consider :
> > 
> > unsigned int test2b(void)
> > {
> >     unsigned int val;
> > 
> >     cli();
> >     val = ivar;
> >     sei();
> >     val = 65535 / val;
> >     return val;
> > }
> > 
> > In this case, the compiler could move the division backwards above the
> > sei(), giving a similar problem.  (It did not make the move in my brief
> > tests - but it /could/ do.)  I don't know if the -fno-tree-ter flag
> > stops that too, but the forceDependency() macro is not enough.  The
> > forgetCompilerKnowledge macro is the answer:
> > 
> > unsigned int test2b(void)
> > {
> >     unsigned int val;
> > 
> >     cli();
> >     val = ivar;
> >     sei();
> >     asm volatile ("" : "+g" (val));
> >     val = 65535 / val;
> >     return val;
> > }
> > 
> > This tells the compiler that it needs to stabilise the value of "val",
> > and it can't assume anything about "val" after this point in the code,
> > because it /might/ be read and /might/ change in the assembly code.
> > Again, nothing is actually generated in the assembly and we are only
> > forcing an ordering on the code.
> > 
> > 
> > Nothing would please me better here here than to have the compiler
> > understand that users would not want such re-ordering around cli() and
> > sei(), so that the problem simply goes away.  But it should not require
> > particular choices of compiler flags, nor should it require disabling
> > useful optimisations and thus generating poorer code elsewhere.
> > 
> > It is also worth noting that though this situation occurs because
> > division does not work like a normal function call, despite it using a
> > library call for implementation, there is nothing fundamental to stop
> > the compiler moving a call to foo() back or forth across a cli() or
> > sei() as long as the compiler is sure that no memory is accessed, no
> > volatiles are accessed, and there are no other externally visible
> > effects in foo().  If the definition of foo() is available when
> > compiling the code, then the compiler could well know this to be the
> > case.  If we replace "val = 65535U / val;" with "val = foo(val);", where
> > we have :
> > 
> >     unsigned int foo(unsigned int v) {
> >             return (v * v) - v;
> >     }
> > 
> > in the same compilation unit, some or all of the calculation from foo()
> > will be inlined and mixed with the cli().  Again, -fno-tree-ter fixes
> > this - at the cost of killing such mixing and optimisation in cases
> > where it is useful.  And again, the inline assembly fixes it at the cost
> > of knowing that you have to add this line of source code.
> > 
> > As gcc gets ever smarter with its inlining, function cloning, link-time
> > optimisations, etc., then this will become more and more of an issue.
> > 
> > 
> > 
> > Maybe the answer is that gcc needs an "execution barrier" that is
> > stronger than a memory barrier, because it will also block calculations
> > - it would act as a barrier to all local variables.  I cannot think of
> > any way to make such a barrier with inline assembly or the C11 fences -
> > I think it would take a new __builtin for gcc.  Such a feature would
> > have use on all embedded targets, not just AVR.
> > 
> > mvh.,
> > 
> > David
> > 
> > 
> 
> I totally agree with you - a feature like "execution barrier" would be very 
> useful. C11 made good job standardizing multi-threading features but 
> unfortunately the features not always fits firmware development. Controlling 
> what exactly goes into critical section is a fundamental problem, so I would 
> even go further - why don't you propose the "execution barrier" as a new 
> feature for the future C language standard? 
> 
> > 
> > > 
> > > Johann
> > > 
> > >> I don't know whether this group is the right place to post it however
> > >> I do not know any better place. Hope someone here can trigger the
> > >> change of the documentation and I also hope to be corrected if I am
> > >> wrong.
> > >>
> > >> Thanks and regards,
> > >> Marcin
> > 
> > 
> 
> 
> 
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]