[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Recei
Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Receivers
Fri, 11 Mar 2016 10:20:57 -0500
I misread your question. See my additional answer below
On Fri, 2016-03-11 at 02:34 +0000, Joshua Lilly wrote:
> Hey Andy,
> Just had a quick question about item number two on this list.
> 2. For an immediate performance increase for most users, add a new
> gnuradio/gr-blocks/grc/blocks_add_const_xx.xml to the build that
> users to select the faster, non-vector version of the add const block
> from the GUI.
> After reading through the tweaked python script it looked like the
> add_const_xx block should consist of the add_const_ss block? However,
> if that is the case isn't this already taken care of with the add_xx
No. add_xx adds multiple input streams together. add_const_vxx adds a
constant to the input stream.
Drop both types of add blocks in the flowgraph within the GRC GUI, and
you will immediately see the difference.
> Thanks for your help,
> On Mar 06, 2016, at 01:08 PM, Andy Walls <address@hidden>
> > On Sun, 2016-03-06 at 08:49 -0500, address@hidden
> > wrote:
> > > Message: 5
> > > Date: Sun, 06 Mar 2016 06:45:13 +0000 (GMT)
> > > From: Joshua Lilly
> > > Hello,
> > > My name is Josh and I am interested in getting involved in GNU
> > > radio.
> > > Specifically, I would like to work on the above project idea for
> > > google summer of code 2016 by implementing Viterbi and demux
> > > algorithms in volk and testing the speed improvements. I have
> > > experience with python, c/c++, boost, and profiling with valgrind.
> > > I
> > > currently have read the getting involved page, compiled the code,
> > > I am
> > > working my way through some of the tutorials, and I have read
> > > through
> > > the code in volk. Even if I don't get accepted to google summer of
> > > code, I would still like to get involved in fixing bugs, or
> > > something
> > > since this seems like a really awesome project.
> > Hi Josh:
> > I'm only a kibitzer when it comes to the project, so I can't say
> > anything about GSoC acceptance.
> > > If it isn't too much to ask could someone point me to a nice
> > > beginner
> > > bug to fix in order to get my hands in the code?
> > However I can give you (and anyone who wants it) a relevant beginner
> > +intermediate thing to get your hands in the code. The
> > "intermediate"
> > part comes from your request to play in volk, which I don't consider
> > stuff for beginners.
> > So we'll start with a very conceptually simple thing to improve:
> > adding
> > constant(s) to a sample stream. Specifically measuring and improving
> > the performance of the add_const_vXX and add_const_XX blocks in
> > gnuradio/gr-blocks/lib.
> > See the attached GRC flowgraph and hand-tweaked
> > add_const_performance.py
> > python script.
> > 1. Measure the baseline performance of both the add_const_vss and
> > add_const_ss blocks at the high sample rate of 160 Msps.
> > $ ps -eLo pcpu,pid,tid,cls,rtprio,pcpu,comm
> > shows the add_const_vss or add_const_ss thread hovering around 70%
> > and
> > 57% repsectively.
> > For meaningful measurements you must run the flowgraph RT prioirty.
> > 2. For an immediate performance increase for most users, add a new
> > gnuradio/gr-blocks/grc/blocks_add_const_xx.xml to the build that
> > allows
> > users to select the faster, non-vector version of the add const
> > block
> > from the GUI.
> > 3. Measure the baseline of where the most CPU is being consumed in
> > these
> > blocks.
> > You can use perf tools or oprofile tools or whatever works for you.
> > For meaningful measurements you must run the flowgraph RT priority.
> > Odds are, it's the block's work() function that is consuming most of
> > the
> > CPU.
> > 4. Create volk kernels to replace the main operations in the work()
> > functions of these blocks, if you can. Since adding a constant is so
> > simple, and ORC is very good about optimizing simple things, the
> > volk
> > implementations should include an ORC implementation if possible.
> > Odds
> > are the ORC implementation will beat hand-written SIMD versions for
> > x86
> > processors. Use volk_profile to prove my guess about ORC right or
> > wrong. :)
> > 5. Create volk-ized versions of the add_const blocks and remeasure
> > their
> > performance. How much improvement did you get?
> > 6. Don't forget to add QA tests for the new volk functions.
> > As an alternate to the above:
> > 1. Improve the performance of the nlog10_ff block by using log2,
> > algebra, volk, and skipping the add of k at the end, if k == 0.0.
> > 2. Create a new approx_nlog10_ff block by taking advantage of the
> > fact
> > that the log2 exponent in IEEE floats can be obtained with a mask
> > and
> > shift operation. Don't forget to add a GRC .xml file for the block
> > and
> > QA test code.
> > > Thank you,
> > > Josh
> > Regards,
> > Andy