avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t


From: Dale Whitfield
Subject: Re: [avr-gcc-list] Re: Optimisation of bit set/clear in uint32_t
Date: Wed, 22 Apr 2009 12:18:15 +0200
User-agent: Mutt/1.5.18 (2008-05-17)

Hi David                                         >@2009.04.22_10:55:27_+0200

> Dale Whitfield wrote:
>> Hi David                                         >@2009.04.21_19:42:34_+0200
>>
>>> Dale Whitfield wrote:
>>>> Hi,
>>>>
>>>> All this talk of super-optimisers, etc. brings this issue I've had,
>>>> to the fore again.
>>>>
>>>> Perhaps there are suggestions on how to encourage the compiler to do the
>>>> optimisations rather than having to hand-code to get the best result.
>>>>
>>>> Take this code:
>>>>
>>>> volatile uint32_t  status;
>>>> static inline void set_status(uint32_t flag, uint8_t set) 
>>>> __attribute__((always_inline));
>>>>
>>>> void set_status(uint32_t flag, uint8_t set) {
>>>>    if (set) status |= flag;
>>>>    else status &= ~flag;
>>>> }
>>>>
>>>> Which when used in an interrupt generates:
>>>>
>>>> ISR (INT7_vect)
>>>> {
> <snip>
>>>>
>>>> That's way too much code... Its fairly obvious where the optimisations
>>>> should be, although I can see that some have been done already.
>>>>
>>> I can't see much possibility for improving the above code (except by  
>>> removing the push and zeroing of r1).  You asked for "status" to be a 
>>>  volatile 32-bit int, so that's what you got.  The code below is   
>>> semantically different, and thus compiles differently.
>>>
>>
>> I don't believe it is semantically different. Which is one reason I
>> raised this. I am using the version below in existing code and it
>> behaves correctly.
>>
>> Loading 4 registers and then storing back 3 that are unchanged makes no
>> sense at all.
>>
>
> I'm afraid that's how "volatile" works.  In a way, by using "volatile"  
> you are telling the compiler "I *know* this makes no sense to you, but  
> do it my way anyway".
>

Its how volatile works by *loading* the registers before making changes.
However, if its possible to only store back changed bytes, then surely
that's all that needs to be done.

>> Where volatile comes in here is that the optimisation shouldn't use any
>> previous loads or modify register values more than once without
>> reloading/storing etc. Here, the value is loaded once, one byte is
>> changed and then all 4 are stored. That's wasteful.
>>
>
> That's what you get when you make a 32-bit item volatile - all accesses  
> to it use the whole 32-bit data, every time.
>

:-) I guess we need to agree to disagree on this. I was hoping this
wouldn't become a philosophical issue around volatile data. Which is
fraught at the best of times.

> Suppose you have a 16-bit hardware timer that must always be set low  
> byte first, then high byte.  If you make a change that just affects the  
> low byte, you still want the compiler to generate the full 16-bit write  
> - you achieve that by declaring the timer as a 16-bit volatile variable.  
>  How is the compiler supposed to guess that you mean something different 
> in this case?
>

I don't think this is a good example, because processor register
accesses should be treated differently to plain RAM accesses. And, yes,
of course just updating one byte in this scenario would be wrong.

You highlight an interesting distinction though. 

Maybe this is where the optimisation goes astray. For IO access, or
access to processor registers rather, since they may be accessed via lds
sts, the optimisation doesn't apply. However, for RAM access, it does.

>> In this example, the interrupt routine is half as long when, as I see
>> it, correctly optimised. On an interrupt particularly, this could be
>> critical time/cycle saving.
>>
>
> I agree with your conclusion - the code is longer and slower than  
> needed.  But you are totally wrong on your premise - this is not because  
> the compiler is failing in its optimisation.  It is because *your* code  
> forces it to generate long and slow code.
>
> There is no reason for using "volatile" in this situation - making  
> "status" non-volatile will give you exactly the code you want here.  But  
> it may then cause trouble for other code, depending on how you use  
> "status" and "set_status".
>

There is every reason. The flags get updated in interrupt routines and
in non-interrupt code.

> Try the following version of set_status:
>
> void set_status(uint32_t flag, uint8_t set) {
>       if (set) *(uint32_t* &status) |= flag;
>       else *(uint32_t* &status) &= ~flag;
>       asm("" : : : "memory");
> }
>
> First, we remove the "volatile" from status so that the optimiser can  
> give you a single byte access when the function is inlined.  Secondly,  
> instead of using "volatile" to protect the access to "status", we use a  
> memory clobber that forces writes to memory variables to be completed  
> before going on.  This will ensure that the change is written out to  
> status as soon as possible (otherwise changes could be cached for later  
> storage).
>

Which would be a problem, since the volatile nature of the variable is
important in this usage.

> (I haven't tried compiling that code yet.)
>
>
>
> _______________________________________________
> AVR-GCC-list mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]