[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Recei
Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Receivers
Mon, 07 Mar 2016 04:37:19 +0000 (GMT)
Andy and Nathan,
I really appreciate the insight, resources and the direction you have provided. I am starting to look into adding the constants to the blocks. When I run into questions, I will check in on irc or send another email.
Thanks again for all of the help!
On Mar 06, 2016, at 01:08 PM, Andy Walls <address@hidden> wrote:
On Sun, 2016-03-06 at 08:49 -0500, address@hidden
Date: Sun, 06 Mar 2016 06:45:13 +0000 (GMT)
From: Joshua Lilly
My name is Josh and I am interested in getting involved in GNU radio.
Specifically, I would like to work on the above project idea for
google summer of code 2016 by implementing Viterbi and demux
algorithms in volk and testing the speed improvements. I have
experience with python, c/c++, boost, and profiling with valgrind. I
currently have read the getting involved page, compiled the code, I am
working my way through some of the tutorials, and I have read through
the code in volk. Even if I don't get accepted to google summer of
code, I would still like to get involved in fixing bugs, or something
since this seems like a really awesome project.
I'm only a kibitzer when it comes to the project, so I can't say
anything about GSoC acceptance.
If it isn't too much to ask could someone point me to a nice beginner
bug to fix in order to get my hands in the code?
However I can give you (and anyone who wants it) a relevant beginner
+intermediate thing to get your hands in the code. The "intermediate"
part comes from your request to play in volk, which I don't consider
stuff for beginners.
So we'll start with a very conceptually simple thing to improve: adding
constant(s) to a sample stream. Specifically measuring and improving
the performance of the add_const_vXX and add_const_XX blocks in
See the attached GRC flowgraph and hand-tweaked add_const_performance.py
1. Measure the baseline performance of both the add_const_vss and
add_const_ss blocks at the high sample rate of 160 Msps.
$ ps -eLo pcpu,pid,tid,cls,rtprio,pcpu,comm
shows the add_const_vss or add_const_ss thread hovering around 70% and
For meaningful measurements you must run the flowgraph RT prioirty.
2. For an immediate performance increase for most users, add a new
gnuradio/gr-blocks/grc/blocks_add_const_xx.xml to the build that allows
users to select the faster, non-vector version of the add const block
from the GUI.
3. Measure the baseline of where the most CPU is being consumed in these
You can use perf tools or oprofile tools or whatever works for you.
For meaningful measurements you must run the flowgraph RT priority.
Odds are, it's the block's work() function that is consuming most of the
4. Create volk kernels to replace the main operations in the work()
functions of these blocks, if you can. Since adding a constant is so
simple, and ORC is very good about optimizing simple things, the volk
implementations should include an ORC implementation if possible. Odds
are the ORC implementation will beat hand-written SIMD versions for x86
processors. Use volk_profile to prove my guess about ORC right or
5. Create volk-ized versions of the add_const blocks and remeasure their
performance. How much improvement did you get?
6. Don't forget to add QA tests for the new volk functions.
As an alternate to the above:
1. Improve the performance of the nlog10_ff block by using log2,
algebra, volk, and skipping the add of k at the end, if k == 0.0.
2. Create a new approx_nlog10_ff block by taking advantage of the fact
that the log2 exponent in IEEE floats can be obtained with a mask and
shift operation. Don't forget to add a GRC .xml file for the block and
QA test code.
Description: Text Data