discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Recei


From: Andy Walls
Subject: Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Receivers
Date: Fri, 11 Mar 2016 10:03:11 -0500

Hi Josh:

Please keep conversations on the mailing list.  Thanks.

On Fri, 2016-03-11 at 02:34 +0000, Joshua Lilly wrote:
> Hey Andy,
> 
> Just had a quick question about item number two on this list.
> 
> 
> 
> 2. For an immediate performance increase for most users, add a new
> gnuradio/gr-blocks/grc/blocks_add_const_xx.xml to the build that
> allows
> users to select the faster, non-vector version of the add const block
> from the GUI.
> 
> 
> After reading through the tweaked python script it looked like the
> add_const_xx block should consist of the add_const_ss block? However,
> if that is the case isn't this already taken care of with the add_xx
> block?

The problem/bug in number 2. is that the GRC GUI only allows one to add
an add_const_vxx block - which appear to be markedly slower than the
corresponding add_const_xx block - and provides no option for a user to
add in an add_const_xx block to a flowgraph.  Number 2. is simply to
correct that problem.

The tweaked python script just has a single line I threw in manually
after I had GRC generate the python script from the flowgraph:

        [...]
        self.blocks_throttle_0 = blocks.throttle(gr.sizeof_short*1, 
samp_rate,True)
        self.blocks_null_source_0 = blocks.null_source(gr.sizeof_short*1)
        self.blocks_null_sink_0 = blocks.null_sink(gr.sizeof_short*1)
        self.blocks_add_const_vxx_0 = blocks.add_const_vss((515, ))
        #self.blocks_add_const_vxx_0 = blocks.add_const_ss(515)
        [...]

GRC did not generate that last line; I did.

I can switch the comment character on those last two lines, to choose
which type of add_const block (_vss or _ss) I want to use.

The GRC GUI doesn't have this, likely because it was an oversight.
Unfortunately, that oversight pushes all GRC GUI users to use the slower
versions of add_const, the _vxx versions, all the time.

Fix that. :)

This part is meant to acquaint one somewhat with how the GRC GUI creates
gnuradio user applications, and how the GUI wants XMl templates for new
blocks defined.

> Thanks for your help,

> Josh

Regards,
Andy

> 
> On Mar 06, 2016, at 01:08 PM, Andy Walls <address@hidden>
> wrote:
> 
> > On Sun, 2016-03-06 at 08:49 -0500, address@hidden
> > wrote:
> > > Message: 5
> > > Date: Sun, 06 Mar 2016 06:45:13 +0000 (GMT)
> > > From: Joshua Lilly
> > 
> > 
> > > Hello,
> > > My name is Josh and I am interested in getting involved in GNU
> > > radio.
> > > Specifically, I would like to work on the above project idea for
> > > google summer of code 2016 by implementing Viterbi and demux
> > > algorithms in volk and testing the speed improvements. I have
> > > experience with python, c/c++, boost, and profiling with valgrind.
> > > I
> > > currently have read the getting involved page, compiled the code,
> > > I am
> > > working my way through some of the tutorials, and I have read
> > > through
> > > the code in volk. Even if I don't get accepted to google summer of
> > > code, I would still like to get involved in fixing bugs, or
> > > something
> > > since this seems like a really awesome project.
> > 
> > Hi Josh:
> > 
> > I'm only a kibitzer when it comes to the project, so I can't say
> > anything about GSoC acceptance.
> > 
> > 
> > > If it isn't too much to ask could someone point me to a nice
> > > beginner
> > > bug to fix in order to get my hands in the code?
> > 
> > However I can give you (and anyone who wants it) a relevant beginner
> > +intermediate thing to get your hands in the code. The
> > "intermediate"
> > part comes from your request to play in volk, which I don't consider
> > stuff for beginners.
> > 
> > So we'll start with a very conceptually simple thing to improve:
> > adding
> > constant(s) to a sample stream. Specifically measuring and improving
> > the performance of the add_const_vXX and add_const_XX blocks in
> > gnuradio/gr-blocks/lib.
> > 
> > See the attached GRC flowgraph and hand-tweaked
> > add_const_performance.py
> > python script.
> > 
> > 
> > 1. Measure the baseline performance of both the add_const_vss and
> > add_const_ss blocks at the high sample rate of 160 Msps.
> > 
> > $ ps -eLo pcpu,pid,tid,cls,rtprio,pcpu,comm
> > 
> > shows the add_const_vss or add_const_ss thread hovering around 70%
> > and
> > 57% repsectively.
> > 
> > For meaningful measurements you must run the flowgraph RT prioirty.
> > 
> > 
> > 2. For an immediate performance increase for most users, add a new
> > gnuradio/gr-blocks/grc/blocks_add_const_xx.xml to the build that
> > allows
> > users to select the faster, non-vector version of the add const
> > block
> > from the GUI.
> > 
> > 
> > 3. Measure the baseline of where the most CPU is being consumed in
> > these
> > blocks.
> > You can use perf tools or oprofile tools or whatever works for you. 
> > For meaningful measurements you must run the flowgraph RT priority.
> > Odds are, it's the block's work() function that is consuming most of
> > the
> > CPU.
> > 
> > 
> > 4. Create volk kernels to replace the main operations in the work()
> > functions of these blocks, if you can. Since adding a constant is so
> > simple, and ORC is very good about optimizing simple things, the
> > volk
> > implementations should include an ORC implementation if possible.
> > Odds
> > are the ORC implementation will beat hand-written SIMD versions for
> > x86
> > processors. Use volk_profile to prove my guess about ORC right or
> > wrong. :)
> > 
> > 
> > 5. Create volk-ized versions of the add_const blocks and remeasure
> > their
> > performance. How much improvement did you get?
> > 
> > 
> > 6. Don't forget to add QA tests for the new volk functions.
> > 
> > 
> > As an alternate to the above:
> > 
> > 1. Improve the performance of the nlog10_ff block by using log2,
> > algebra, volk, and skipping the add of k at the end, if k == 0.0.
> > 
> > 2. Create a new approx_nlog10_ff block by taking advantage of the
> > fact
> > that the log2 exponent in IEEE floats can be obtained with a mask
> > and
> > shift operation. Don't forget to add a GRC .xml file for the block
> > and
> > QA test code.
> > 
> > > Thank you,
> > > Josh
> > 
> > 
> > Regards,
> > Andy
> > 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]