[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"

From: Bob Paddock
Subject: Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering code"
Date: Thu, 9 Feb 2017 12:35:16 -0500

Are the functions like macros in atomic.h correct?
They attempt to deal properly with critical sections/code motion etc,
in what this thread is discussing.


On Thu, Feb 9, 2017 at 9:14 AM, David Brown <address@hidden> wrote:
> You could file this as a bug on the website:
> <https://savannah.nongnu.org/bugs/?group=avr-libc>
> As far as I understand it, the documentation (both on the website and
> the Atmel documentation) is generated directly from the library code and
> comments - so this would be a change to the library source.
> Feel free to quote any parts of my mails on the subject when filing the bug.
> mvh.,
> David
> On 09/02/17 13:11, Marcin Godlewski wrote:
>> Dear All,
>> The site
>> http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
>> still contains buggy description of memory barriers in avr-gcc. As
>> this site is popular among avr users I think it's really worth
>> fixing. What is more the same inaccurate article is available on
>> Atmel doc site:
>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>> . Is there anybody subscribed to this mailing list who can contact
>> the authors/maintainers of the site in order to discuss correction of
>> the content?
>> Marcin Godlewski
>> W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
>> <address@hidden> napisał:
>>> W dniu 2016-12-09 10:11:55 użytkownik David Brown
>>> <address@hidden> napisał:
>>>> On 08/12/16 21:46, Georg-Johann Lay wrote:
>>>>> Marcin Godlewski schrieb:
>>>>>> Dear all,
>>>>>> Thanks for the reply to David. However I'm not trying to find
>>>>>> a solution for the described issue. What I'm trying to say in
>>>>>> this e-mail is that this part of Atmel documentation:
>>>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> is innacurate and should be corrected. The conclusion says:
>>>>>> memory barriers ensure proper ordering of volatile accesses
>>>>>> memory barriers don't ensure statements with no volatile
>>>>>> accesses to be reordered across the barrier while it should
>>>>>> say:
>>>>>> memory barriers ensure proper ordering of global variables
>>>>>> accesses
>>>>>> memory barriers don't ensure local variables accesses to be
>>>>>> reordered across the barrier
>>>>> At least the "local" vs. "global" is not completely correct.
>>>>> After all it's about memory accesses, and it doesn't matter if
>>>>> the memory is local (e.g. local static) or if you are
>>>>> dereferencing a pointer (which might point to a local auto or
>>>>> to an object on heap).
>>>>> The code example you quoted above is actually due to a subtle
>>>>> implementation detail of division, modulo and some other
>>>>> arithmetic of GCC's avr backend (the division is _not_ a call
>>>>> actually).
>>>>> IIRC the solution for me back then was -fno-tree-ter as any
>>>>> messing with inline asm doesn't hit the point.
>>>> Yes, that is the solution you proposed when we discussed it a
>>>> good while back (on the avrlibc list, I think).  I disagree with
>>>> you somewhat here (as I did then, though I can't remember if we
>>>> discussed the details).
>>>> Changing an optimisation option like this means that code that
>>>> looks right, will run as expected - and that is a good thing.
>>>> But it also means that the code will /only/ be correct if
>>>> particular optimisation flags are set in particular ways.  That
>>>> is a very fragile situation, and should always be avoided.  To be
>>>> safe, this optimisation would have to be completely disabled in
>>>> the avr-gcc port.  I don't know how useful this particular
>>>> optimisation is in terms of generating more efficient code,
>>>> though from the gcc manual it appears very useful and is enabled
>>>> at -O1.  Clearly that determines the "cost" of this solution to
>>>> the re-ordering problem.
>>>> The use of the assembly dependency (or a nicely named macro with
>>>> the same effect) fixes the problem in this situation.  It does so
>>>> regardless of optimisation options - the compiler is required to
>>>> have calculated the result of the division before disabling
>>>> interrupts, and cannot re-order the operations.  It does so
>>>> without adding any extra assembly code or hindering any
>>>> optimisations - it merely forces an order on operations that are
>>>> to be done anyway.
>>>> It has the clear disadvantage of needing extra code in the
>>>> user's source.  Like memory barriers, it is a way of giving the
>>>> compiler extra information that cannot be expressed in normal C,
>>>> and which the compiler cannot (at the moment) figure out for
>>>> itself.
>>>> You say that the assembly dependency does not "hit the point".  I
>>>> think you are correct there - it is treating the symptom, not the
>>>> disease.  It is not telling the compiler that an operation should
>>>> not be re-ordered, or that division is a costly operation.  It
>>>> simply tells the compiler that we need the results of that
>>>> computation /here/.  But it is a very effective and efficient
>>>> cure for this sort of problem.  Unless and until there is some
>>>> /safe/ fix in the compiler to avoid this (and I don't count "put
>>>> this compiler option in your command line" as safe), I really do
>>>> think it is the best we have.
>>>> Note, however, that the "forceDependency" macro only solves half
>>>> the problem.  Consider :
>>>> unsigned int test2b(void) { unsigned int val;
>>>> cli(); val = ivar; sei(); val = 65535 / val; return val; }
>>>> In this case, the compiler could move the division backwards
>>>> above the sei(), giving a similar problem.  (It did not make the
>>>> move in my brief tests - but it /could/ do.)  I don't know if the
>>>> -fno-tree-ter flag stops that too, but the forceDependency()
>>>> macro is not enough.  The forgetCompilerKnowledge macro is the
>>>> answer:
>>>> unsigned int test2b(void) { unsigned int val;
>>>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
>>>> 65535 / val; return val; }
>>>> This tells the compiler that it needs to stabilise the value of
>>>> "val", and it can't assume anything about "val" after this point
>>>> in the code, because it /might/ be read and /might/ change in the
>>>> assembly code. Again, nothing is actually generated in the
>>>> assembly and we are only forcing an ordering on the code.
>>>> Nothing would please me better here here than to have the
>>>> compiler understand that users would not want such re-ordering
>>>> around cli() and sei(), so that the problem simply goes away.
>>>> But it should not require particular choices of compiler flags,
>>>> nor should it require disabling useful optimisations and thus
>>>> generating poorer code elsewhere.
>>>> It is also worth noting that though this situation occurs
>>>> because division does not work like a normal function call,
>>>> despite it using a library call for implementation, there is
>>>> nothing fundamental to stop the compiler moving a call to foo()
>>>> back or forth across a cli() or sei() as long as the compiler is
>>>> sure that no memory is accessed, no volatiles are accessed, and
>>>> there are no other externally visible effects in foo().  If the
>>>> definition of foo() is available when compiling the code, then
>>>> the compiler could well know this to be the case.  If we replace
>>>> "val = 65535U / val;" with "val = foo(val);", where we have :
>>>> unsigned int foo(unsigned int v) { return (v * v) - v; }
>>>> in the same compilation unit, some or all of the calculation from
>>>> foo() will be inlined and mixed with the cli().  Again,
>>>> -fno-tree-ter fixes this - at the cost of killing such mixing and
>>>> optimisation in cases where it is useful.  And again, the inline
>>>> assembly fixes it at the cost of knowing that you have to add
>>>> this line of source code.
>>>> As gcc gets ever smarter with its inlining, function cloning,
>>>> link-time optimisations, etc., then this will become more and
>>>> more of an issue.
>>>> Maybe the answer is that gcc needs an "execution barrier" that
>>>> is stronger than a memory barrier, because it will also block
>>>> calculations - it would act as a barrier to all local variables.
>>>> I cannot think of any way to make such a barrier with inline
>>>> assembly or the C11 fences - I think it would take a new
>>>> __builtin for gcc.  Such a feature would have use on all embedded
>>>> targets, not just AVR.
>>>> mvh.,
>>>> David
>>> I totally agree with you - a feature like "execution barrier" would
>>> be very useful. C11 made good job standardizing multi-threading
>>> features but unfortunately the features not always fits firmware
>>> development. Controlling what exactly goes into critical section is
>>> a fundamental problem, so I would even go further - why don't you
>>> propose the "execution barrier" as a new feature for the future C
>>> language standard?
>>>>> Johann
>>>>>> I don't know whether this group is the right place to post it
>>>>>> however I do not know any better place. Hope someone here can
>>>>>> trigger the change of the documentation and I also hope to be
>>>>>> corrected if I am wrong.
>>>>>> Thanks and regards, Marcin
> _______________________________________________
> AVR-GCC-list mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]