Re: [Discuss-gnuradio] sine

------------------ Original ------------------

From: "Marcus Müller via USRP-users";<address@hidden>;

Date: Thu, Apr 7, 2016 06:00 PM

To: "discuss-gnuradio"<address@hidden>;

Subject: Re: [Discuss-gnuradio] sine_table.h

Forgot to include the link to my benchmarking tool:
https://github.com/marcusmueller/table_vs_volk
Had too look intensely for your mail:
Trek, please don't "hijack" other threads by replying to them with a completely unrelated topic. If starting a new topic, simply send an email to the mailing list, without using the "reply" functionality, or else, most people won't even see it, because it's buried in a discussion thread irrelevant to them.

Best regards,
Marcus

On 07.04.2016 11:40, Marcus Müller wrote:

Hi Trek,

as Martin noted, yes, if you search the GNU Radio source tree for that file name, you'll find it. And also, yes, GNU Radio is Free Software, and one of the main credos of that is that you should be able to use everything from it for your own purposes (as long as you adhere to the freeness that the part you're using demands; for GNU Radio, that's GPLv3). However, to be honest, a linear approximation-based 8kB sine table might or might not be the right tool for your problem – usually, one would just think about what one needs and generate the sine table oneself, matching exactly the requirements at hand.

Us being DSP nerds, I guess some of us are curious: what is your fixed point application? Are you planning to use this on some microcontroller, or some programmable logic device, or do you need a sin where you transform fixed point values (e.g. from an ADC) to floating point values? What is the algorithm you're building with that?

However, are you /sure/ a sine table is the optimum for your specific problem?
I'm not an overly big fan of uniform sine tables (they make a lot of sense on e.g. microcontrollers that don't have advanced math functions, and if you don't need the accuracy), but if you look at VOLK, you'll find things that are comparably fast, or in my case, even faster; using a benchmarking stub I've got lying around (didn't specify any compiler optimizations, i.e. gcc will not optimize).
Doing 100000000 operations.
fixed point
 0.781710s wall, 0.780000s user + 0.000000s system = 0.780000s CPU (99.8%)
standard libc float32 sin
 2.700463s wall, 2.700000s user + 0.000000s system = 2.700000s CPU (100.0%)
VOLK float32 sin
Using Volk machine: avx2_64_mmx_orc
 0.331708s wall, 0.330000s user + 0.000000s system = 0.330000s CPU (99.5%)
dummy memory bandwidth test: copy out- to input
 0.404707s wall, 0.400000s user + 0.000000s system = 0.400000s CPU (98.8%)
dummy memory bandwidth test: copy in- to output
 0.406990s wall, 0.410000s user + 0.000000s system = 0.410000s CPU (100.7%)
Volk of course only makes sense if you can arrange your algorithms so that you get a lot of sin input values continuously in memory.

Four observations:

This sine-table implementation is but three times faster than the standard libc sin, not even counting the fact that you'd have to first come up with the proper input scaling. Unless your program is really dominated by sin() performance, this might not be even worth considering. A general hint: run "perf record -a yourprogram"; "perf report" to find out where your PC spent it's time. Well, at least without compiler optimizations.

The VOLK routine is twice as fast as the fixed point implementation, and being a six-summand Taylor series approximation probably more accurate.

Enabling compiler optimizations (CFLAGS=-Ofast make) will probably double the speed of sin (my experience), and severely cut the the time that the fixed point implementation takes, probably slightly below the time of Volk (which will not change measurably). That's because the compiler will inline everything in the fixed point routine. Whether that slight advantage then will be worth the accuracy loss is up to you.

VOLK's sin is faster than float-wise copy (here, without compiler optimizations); what seems paradox shows that making extensive use of memory alignment and SIMD brings you much closer to the memory bandwidth barrier. Knowing my machine, I now have a guess for the performance of the fixed point sin table approach under heavy compiler optimization: it will take around ¼ of the time one of the dummy copies takes; that's how fast you get with 4-float32 SIMD here, assuming this is really only bandwidth-limited. Trying this verifies my suspicion!

As you can see, the question what approach is fast really depends on what your compiler does, what SIMD instructions you can make use of (VOLK's sin only has optimizations for SSE4.1, I think) and how your data lies in memory.

Best regards,
Marcus

On 07.04.2016 05:26, Trek Liu wrote:
What is the purpose of this file? There is zero documentation in this file, is it ever being used?

I am looking for a sin/cos table for speed optimization, is there one inside gnuradio?

Thanks.
_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

From:	Marcus Müller
Subject:	Re: [Discuss-gnuradio] sine_table.h
Date:	Tue, 12 Apr 2016 11:55:18 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0