discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimizati


From: Bogdan Diaconescu
Subject: Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimization with VOLK
Date: Tue, 25 Feb 2014 07:29:58 -0800 (PST)

Hi Tom,

thanks for your answer. The point I was making was that at the moment of me 
writing the Viterbi code, I tried to use the available VOLK functions 
(multiplications, subtractions, etc) and the code was slower than using 
directly intrinsics. Implementing a new kernel for Viterbi decoder (with 
intrinsics of course like the others) was just the next step in the process.

So, I totally agree that it worth creating a kernel to completely solve a 
problem like a convolutional decoder as it will make it faster. The downside 
would be, though, that the next time you want to do something slightly 
different you'll need to create another kernel. But that is the tradeoff 
between the flexibility and speed.

I see the your code using Spiral implementation, I will look to see what speed 
it gives as for me this is one of the biggest challenges. I still believe there 
will be someone who will create a convolutional decoder implementation that is 
both readable and fast :). I know, I am speaking from a open source sw. guys 
perspective :) who inherently has the need to understand all the code.

Bogdan

BTW, from my experience, to speed-up in the case of depuncturing it worth 
making depuncturer part of the decoder or at least aware of that.


--------------------------------------------
On Tue, 2/25/14, Tom Rondeau <address@hidden> wrote:

 Subject: Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : 
Optimization with VOLK
 To: "Bogdan Diaconescu" <address@hidden>
 Cc: "GNURadio Discussion List" <address@hidden>, "Abhishek Bhowmick" 
<address@hidden>
 Date: Tuesday, February 25, 2014, 4:09 PM
 
 On Tue, Feb 25, 2014 at 8:21 AM,
 Bogdan Diaconescu
 <address@hidden>
 wrote:
 > Hi  Abhishek,
 >
 > When implemented gr-dvbt (https://github.com/BogdanDIA/gr-dvbt) I used VOLK 
 > in
 many places to speed-up the processing. However, there is a
 great deal of speed-up that still need to be achieved on
 both Tx/Rx in order to lower cpu cycles consumption so there
 are a lot of challenges in the project from this point of
 view.
 >
 > For example the Viterbi implementation is done using
 intrinsics instead of using VOLK just because when I used
 VOLK it was quite slow, achieving only 16mbps of processing
 per single thread (7-8mbps on just C implementation).
 > Using intrinsics it raised the spead to 32-37mbps per
 thread which is quite good but the code is not directly
 portable. So, a good Viterbi decoder that achieves easily
 over 60mbps speed at input is still necessary probably not
 only in dvb-t implementation but perhaps in other
 applications. Just to add more to the challenge one may want
 to have a readable code beside the necessary speed (Spiral
 viterbi implementation is on the opposite side).
 
 
 Bogdan,
 
 Good advice, generally. Just a few issues to point out.
 First, I think
 there's a misconception between "VOLK" and "using
 intrinsics." VOLK
 uses intrinsics and so whatever code you wrote with the
 intrinsics
 could be done in VOLK. For instance, the fecapi that we are
 working to
 bring into GNU Radio has a constitutional decoder defined as
 a single
 VOLK kernel:
 
 
https://github.com/namccart/fecapi/blob/master/volk_fecapi/kernels/volk_fecapi/volk_fecapi_8u_x4_conv_k7_r2_f2048_8u.h
 
 This is actually Spiral code that was wrapped up into a
 kernel to make
 it portable and usable.
 
 Basically, I'm trying to convey that there is not limit to
 what we can
 define as a kernel in VOLK. In fact, the more complex the
 kernel, the
 better the speedup because you can keep the data inside the
 registers
 and more tightly control the algorithm. We just want a
 kernel to
 represent some operations that would be usable in other
 situations,
 like a convolutional decoder.
 
 
 > The OFDM synchronization code is also very time
 consuming and although uses VOLK already it can be using
 with great benefit new AVX2 instructions. Actually many of
 the blocks can use new instructions to speed-up the data
 processing.
 
 Yes, certainly. The synchronization part is a good place for
 optimization.
 
 Tom
 
 
 
 > Basically, for dvb-t on it's maximum speed with OFDM
 FFT 8k, QAM-64 and puncturing rate 7/8 the output of video
 is of 32mbps which means more than 60mbps of processing
 speed after de-puncturing. A bigger challenge would be
 implementing real life DVB-S receiver where the data rate is
 over 50mbps at video output :) ).
 >
 > This is just my short insight of challenges one may
 face when dealing with speed optimizations in a modern
 communication project.
 >
 > Bogdan
 >
 >
 > --------------------------------------------
 > On Sun, 2/23/14, Abhishek Bhowmick <address@hidden>
 wrote:
 >
 >  Subject: [Discuss-gnuradio] Google Summer of Code
 2014 applicant : Optimization with VOLK
 >  To: address@hidden
 >  Date: Sunday, February 23, 2014, 8:52 AM
 >
 >  Hello,
 >  I have completed a Bachelor's degree in
 >  Electrical Engineering from IIT Bombay, India and
 will be
 >  joining a masters program in Computer Science in
 August. For
 >  the summer, I am interested in participating GSoC
 2014 and
 >  GNU Radio is an organization where my background
 fits
 >  nicely.
 >
 >
 >  I went through the ideas page and was
 >  particularly interested in doing performance
 optimization
 >  with VOLK. After going through some online
 documentation
 >  about the library and the SDR'12 paper, I
 realised that
 >  following areas need work :
 >
 >  1. Profiling GNU radio code to identify new
 >  kernels and implement them for existing Intel
 SIMD
 >  extensions, also porting kernels to other ISA
 extensions.
 >  2. Better testing of the effects of more complex
 >  scheduler logic on larger environments (beyond
 simple
 >  kernels)
 >
 >  3. Exploring extension of Volk to GPU ISAs, to
 >  leverage chips such as AMD Fusion (However, this
 seems to
 >  more research than software development)
 >
 >  According to the GSoC proposal, point (1) seems
 >  to be the expectation. Given this, I would like
 some advice
 >  on how to go ahead looking for potential ideas
 (and some
 >  feedback on feasibility of the other ideas as
 well)
 >
 >
 >  My background : C++, Python, Signal Processing,
 >  Computer Architecture
 >
 >  Thanks,
 >  Abhishek Bhowmick
 >
 >
 >  -----Inline Attachment Follows-----
 >
 >  _______________________________________________
 >  Discuss-gnuradio mailing list
 >  address@hidden
 >  https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
 >
 >
 > _______________________________________________
 > Discuss-gnuradio mailing list
 > address@hidden
 > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio




reply via email to

[Prev in Thread] Current Thread [Next in Thread]