discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimizati


From: Tom Rondeau
Subject: Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimization with VOLK
Date: Tue, 25 Feb 2014 09:09:37 -0500

On Tue, Feb 25, 2014 at 8:21 AM, Bogdan Diaconescu
<address@hidden> wrote:
> Hi  Abhishek,
>
> When implemented gr-dvbt (https://github.com/BogdanDIA/gr-dvbt) I used VOLK 
> in many places to speed-up the processing. However, there is a great deal of 
> speed-up that still need to be achieved on both Tx/Rx in order to lower cpu 
> cycles consumption so there are a lot of challenges in the project from this 
> point of view.
>
> For example the Viterbi implementation is done using intrinsics instead of 
> using VOLK just because when I used VOLK it was quite slow, achieving only 
> 16mbps of processing per single thread (7-8mbps on just C implementation).
> Using intrinsics it raised the spead to 32-37mbps per thread which is quite 
> good but the code is not directly portable. So, a good Viterbi decoder that 
> achieves easily over 60mbps speed at input is still necessary probably not 
> only in dvb-t implementation but perhaps in other applications. Just to add 
> more to the challenge one may want to have a readable code beside the 
> necessary speed (Spiral viterbi implementation is on the opposite side).


Bogdan,

Good advice, generally. Just a few issues to point out. First, I think
there's a misconception between "VOLK" and "using intrinsics." VOLK
uses intrinsics and so whatever code you wrote with the intrinsics
could be done in VOLK. For instance, the fecapi that we are working to
bring into GNU Radio has a constitutional decoder defined as a single
VOLK kernel:

https://github.com/namccart/fecapi/blob/master/volk_fecapi/kernels/volk_fecapi/volk_fecapi_8u_x4_conv_k7_r2_f2048_8u.h

This is actually Spiral code that was wrapped up into a kernel to make
it portable and usable.

Basically, I'm trying to convey that there is not limit to what we can
define as a kernel in VOLK. In fact, the more complex the kernel, the
better the speedup because you can keep the data inside the registers
and more tightly control the algorithm. We just want a kernel to
represent some operations that would be usable in other situations,
like a convolutional decoder.


> The OFDM synchronization code is also very time consuming and although uses 
> VOLK already it can be using with great benefit new AVX2 instructions. 
> Actually many of the blocks can use new instructions to speed-up the data 
> processing.

Yes, certainly. The synchronization part is a good place for optimization.

Tom



> Basically, for dvb-t on it's maximum speed with OFDM FFT 8k, QAM-64 and 
> puncturing rate 7/8 the output of video is of 32mbps which means more than 
> 60mbps of processing speed after de-puncturing. A bigger challenge would be 
> implementing real life DVB-S receiver where the data rate is over 50mbps at 
> video output :) ).
>
> This is just my short insight of challenges one may face when dealing with 
> speed optimizations in a modern communication project.
>
> Bogdan
>
>
> --------------------------------------------
> On Sun, 2/23/14, Abhishek Bhowmick <address@hidden> wrote:
>
>  Subject: [Discuss-gnuradio] Google Summer of Code 2014 applicant : 
> Optimization with VOLK
>  To: address@hidden
>  Date: Sunday, February 23, 2014, 8:52 AM
>
>  Hello,
>  I have completed a Bachelor's degree in
>  Electrical Engineering from IIT Bombay, India and will be
>  joining a masters program in Computer Science in August. For
>  the summer, I am interested in participating GSoC 2014 and
>  GNU Radio is an organization where my background fits
>  nicely.
>
>
>  I went through the ideas page and was
>  particularly interested in doing performance optimization
>  with VOLK. After going through some online documentation
>  about the library and the SDR'12 paper, I realised that
>  following areas need work :
>
>  1. Profiling GNU radio code to identify new
>  kernels and implement them for existing Intel SIMD
>  extensions, also porting kernels to other ISA extensions.
>  2. Better testing of the effects of more complex
>  scheduler logic on larger environments (beyond simple
>  kernels)
>
>  3. Exploring extension of Volk to GPU ISAs, to
>  leverage chips such as AMD Fusion (However, this seems to
>  more research than software development)
>
>  According to the GSoC proposal, point (1) seems
>  to be the expectation. Given this, I would like some advice
>  on how to go ahead looking for potential ideas (and some
>  feedback on feasibility of the other ideas as well)
>
>
>  My background : C++, Python, Signal Processing,
>  Computer Architecture
>
>  Thanks,
>  Abhishek Bhowmick
>
>
>  -----Inline Attachment Follows-----
>
>  _______________________________________________
>  Discuss-gnuradio mailing list
>  address@hidden
>  https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
> _______________________________________________
> Discuss-gnuradio mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]