Dear Trek,
happy to help! But:
50 Hz is a laughably low rate; you won't need any optimization on
that, unless you do in the order of 100,000 sin/cos at 50Hz.
Best regards,
Marcus
On 12.04.2016 11:48, Trek Liu wrote:
Thanks Marcus. That is big help. I will try to follow the
mailing list email guideline.
We are designing mems sensor
quarternion algorithm on an high end android device (Samsung
S5), which requires sin/cos at high rate (maybe 50Hz
eventually), so I want
to make sure the sin/cos are fully optimized. I will follow
your advice.
Thanks.
Trek
------------------ Original ------------------
Date: Thu, Apr 7, 2016 06:00 PM
Subject: Re: [Discuss-gnuradio] sine_table.h
Forgot to include the link to my benchmarking tool:
https://github.com/marcusmueller/table_vs_volk
Had too look intensely for your mail:
Trek, please don't "hijack" other threads by replying to them
with a completely unrelated topic. If starting a new topic,
simply send an email to the mailing list, without using the
"reply" functionality, or else, most people won't even see it,
because it's buried in a discussion thread irrelevant to them.
Best regards,
Marcus
On 07.04.2016 11:40, Marcus Müller
wrote:
Hi Trek,
as Martin noted, yes, if you search the GNU Radio source tree
for that file name, you'll find it. And also, yes, GNU Radio
is Free Software, and one of the main credos of that is that
you should be able to use everything from it for your own
purposes (as long as you adhere to the freeness that the part
you're using demands; for GNU Radio, that's GPLv3). However,
to be honest, a linear approximation-based 8kB sine table
might or might not be the right tool for your problem –
usually, one would just think about what one needs and
generate the sine table oneself, matching exactly the
requirements at hand.
Us being DSP nerds, I guess some of us are curious: what is
your fixed point application? Are you planning to use this on
some microcontroller, or some programmable logic device, or do
you need a sin where you transform fixed point values
(e.g. from an ADC) to floating point values? What is the
algorithm you're building with that?
However, are you /sure/ a sine table is the optimum for your
specific problem?
I'm not an overly big fan of uniform sine tables (they make a
lot of sense on e.g. microcontrollers that don't have advanced
math functions, and if you don't need the accuracy), but if
you look at VOLK, you'll find things that are comparably fast,
or in my case, even faster; using a benchmarking stub I've got
lying around (didn't specify any compiler optimizations, i.e.
gcc will not optimize).
Doing 100000000 operations.
fixed point
0.781710s wall, 0.780000s user + 0.000000s system = 0.780000s CPU (99.8%)
standard libc float32 sin
2.700463s wall, 2.700000s user + 0.000000s system = 2.700000s CPU (100.0%)
VOLK float32 sin
Using Volk machine: avx2_64_mmx_orc
0.331708s wall, 0.330000s user + 0.000000s system = 0.330000s CPU (99.5%)
dummy memory bandwidth test: copy out- to input
0.404707s wall, 0.400000s user + 0.000000s system = 0.400000s CPU (98.8%)
dummy memory bandwidth test: copy in- to output
0.406990s wall, 0.410000s user + 0.000000s system = 0.410000s CPU (100.7%)
Volk of course only makes sense if you can arrange your
algorithms so that you get a lot of sin input values
continuously in memory.
Four observations:
- This sine-table implementation is but three times faster
than the standard libc sin, not even counting the fact
that you'd have to first come up with the proper input
scaling. Unless your program is really dominated by sin()
performance, this might not be even worth considering. A
general hint: run "perf record -a yourprogram"; "perf
report" to find out where your PC spent it's time. Well,
at least without compiler optimizations.
- The VOLK routine is twice as fast as the fixed point
implementation, and being a six-summand Taylor series
approximation probably more accurate.
- Enabling compiler optimizations (CFLAGS=-Ofast make)
will probably double the speed of sin (my experience), and
severely cut the the time that the fixed point
implementation takes, probably slightly below the time of
Volk (which will not change measurably). That's because
the compiler will inline everything in the fixed point
routine. Whether that slight advantage then will be worth
the accuracy loss is up to you.
- VOLK's sin is faster than float-wise copy (here, without
compiler optimizations); what seems paradox shows that
making extensive use of memory alignment and SIMD brings
you much closer to the memory bandwidth barrier. Knowing
my machine, I now have a guess for the performance of the
fixed point sin table approach under heavy compiler
optimization: it will take around ¼ of the time one of the
dummy copies takes; that's how fast you get with 4-float32
SIMD here, assuming this is really only bandwidth-limited.
Trying this verifies my suspicion!
As you can see, the question what approach is fast really
depends on what your compiler does, what SIMD instructions you
can make use of (VOLK's sin only has optimizations for SSE4.1,
I think) and how your data lies in memory.
Best regards,
Marcus
On 07.04.2016 05:26, Trek Liu
wrote:
What is the purpose of this file? There is zero
documentation in this file, is it ever being used?
I am looking for a sin/cos table for speed
optimization, is there one inside gnuradio?
Thanks.
_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
|