avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] fixed-point: code size, speed and precision


From: Erik Christiansen
Subject: Re: [avr-gcc-list] fixed-point: code size, speed and precision
Date: Thu, 16 Aug 2012 17:22:07 +1000
User-agent: Mutt/1.5.20 (2009-06-14)

On 15.08.12 11:28, Georg-Johann Lay wrote:
> Some day ago I tried to adapt Sean's fixed point patch [1] resp.
> the abandoned attempt [2] to avr-gcc 4.8.  A current version thereof
> can be seen in [3], but there are still several issues.

Many thanks, Johann, for the great work that you are doing. (And for
giving the users some involvement.)

My thoughts stimulated by your questions are just a peripheral opinion
- there are likely to be others making more immediate use of fixed point
arithmetic in C.

...

> Approach 1) leads to the exact result so that the algorithms can be
> sure to round the /exact/ result to get the rounded result (as
> required by TR18037 for instance).
> 
> Approach 2 is faster but does not offer control over rounding
> errors and cannot be used if a saturated result is needed.

While list traffic seems to mostly prefer small code to super fast code
(and I concur), precise code takes precedence over all other
considerations, I feel, especially if the code size saving is less than
a factor of 20.

...

> What is the best approach here?
> 
> Code that is slower, might consume more flash and stack but
> complies to TR18037? Or fast code whose rounding behavior
> if not withing the 2 LSBs as of TR18037?

I'm having trouble finding value in an incorrect result generated
rapidly.

> A code that uses both signed and unsigned versions will
> consume ~200 bytes for the multiplications alone.
> 
> Sign extension can be performed in three different ways:
> 
> 1) Explicit before the computation
> 
> 2) Implicit during the computation
> 
> 3) Explicit after the computation
> 
> [3] currently uses 2) but could reuse the unsigned version and then
> consumes 22 bytes by means of 3) like so:
> 
> DEFUN __mulsa3
>     XCALL   __mulusa3
>     tst     B3
>     brpl    1f
>     sub     C2, A0
>     sbc     C3, A1
> 1:  sbrs    A3, 7
>     ret
>     sub     C2, B0
>     sbc     C3, B1
>     ret
> ENDF __mulsa3
> 
> Thus, if both the signed and the unsigned versions are needed,
> the code size will go down by more than 80 bytes.

Less than 120 bytes for both is great.

> If only the signed version is used, code size goes up by 20 bytes.

But still only to 120 bytes, AIUI.

> What's the best here?

A lean unsigned, and equal size for signed and signed_plus_unsigned,
looks like a good compromise from here, given that it's half of the
worst case.

If it matters whether signed multiply takes 120 bytes rather than 100,
then it was time to move up to the next flash size several months ago.
At Siemens we were never allowed to go into production using more than
80% (IIRC) of ROM, and at NEC I liked to follow the same practice.
That way, if there was a software upgrade, there was room for it.
(Anything more is a new product. Even the managers understood that.)

Thank you Johann, for asking the users.

Erik

-- 
If you stew apples like cranberries, they taste more like prunes than
rhubarb does.
                                                      - Groucho Marx




reply via email to

[Prev in Thread] Current Thread [Next in Thread]