discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimizati


From: Abhishek Bhowmick
Subject: Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant : Optimization with VOLK
Date: Wed, 26 Feb 2014 12:19:15 +0530

Thanks everyone. These are quite a few pointers, I will spend some time
digesting it all.

So there are really two approaches, large complex kernels on
one hand and AVX2/AVX/FMA on the other, or a combination of the two.

I guess I should propose identifying and implementing larger complex kernels
and then further accelerating using AVX2/FMA etc. Doing both will of
course limit the
number of  applications/algorithms I can feasibly target. What's your take on
this ?

Abhishek

On Wed, Feb 26, 2014 at 5:03 AM, West, Nathan
<address@hidden> wrote:
> On Tue, Feb 25, 2014 at 4:37 PM, West, Nathan
> <address@hidden> wrote:
>>>  > On Sun, 2/23/14, Abhishek Bhowmick <address@hidden>
>>>  wrote:
>>>  >
>>>  >  Subject: [Discuss-gnuradio] Google Summer of Code
>>>  2014 applicant : Optimization with VOLK
>>>  >  To: address@hidden
>>>  >  Date: Sunday, February 23, 2014, 8:52 AM
>>>  >
>>>  >  Hello,
>>>  >  I have completed a Bachelor's degree in
>>>  >  Electrical Engineering from IIT Bombay, India and
>>>  will be
>>>  >  joining a masters program in Computer Science in
>>>  August. For
>>>  >  the summer, I am interested in participating GSoC
>>>  2014 and
>>>  >  GNU Radio is an organization wheAbhishekre my background
>>>  fits
>>>  >  nicely.
>>>  >
>>>  >>  > --------------------------------------------
>>
>>>  >  I went through the ideas page and was
>>>  >  particularly interested in doing performance
>>>  optimization
>>>  >  with VOLK. After going through some online
>>>  documentation
>>>  >  about the library and the SDR'12 paper, I
>>>  realised that
>>>  >  following areas need work :
>>>  >
>>>  >  1. Profiling GNU radio code to identify new
>>>  >  kernels and implement them for existing Intel
>>>  SIMD
>>>  >  extensions, also porting kernels to other ISA
>>>  extensions.
>>>  >  2. Better testing of the effects of more complex
>>>  >  scheduler logic on larger environments (beyond
>>>  simple
>>>  >  kernels)
>>>  >
>>>  >  3. Exploring extension of Volk to GPU ISAs, to
>>>  >  leverage chips such as AMD Fusion (However, this
>>>  seems to
>>>  >  more research than software development)
>>>  >
>>>  >  According to the GSoC proposal, point (1) seems
>>>  >  to be the expectation. Given this, I would like
>>>  some advice
>>>  >  on how to go ahead looking for potential ideas
>>>  (and some
>>>  >  feedback on feasibility of the other ideas as
>>>  well)
>>>  >
>>>  >
>>>  >  My background : C++, Python, Signal Processing,
>>>  >  Computer Architecture
>>>  >
>>>  >  Thanks,
>>>  >  Abhishek Bhowmick
>>>  >
>>
>>
>> This is a great conversation, and I'll take the opportunity to plug
>> the up coming VOLK working group call
>> (https://plus.google.com/u/1/events/ch3jrjcvp7mdiqelpismfieg3n0).
>> Bogdan, your results aren't particula>  >
>> --------------------------------------------
>> rly surprising, but the feedback is really good to hear.
>>
>> Back to GSoC:
>>
>> Abhishek,
>>
>>>Thanks for the pointers to gr-atsc and gr-80211. I have started
>>>looking there as a
>>>starting point. Are there similar modules which are undergoing volk
>>>speedup fixes?
>>>I am also trying to meet up with other people who have been using GNU radio
>>>to identify potential modules for acceleration. As you are now a
>>>mentor organization, I feel it's a good time for us to get into
>>>detailed discussions.
>>
>> From the previous discussion it should be apparent that how algorithms
>> are implemented will make the biggest difference, and that the new
>> acceleration is primarily going to come from larger more complex
>> kernels. At the end of the day it's going to be your proposal. So far
>> on the list of places to look we have
>>
>> * in-tree OFDM (contact Martin)
>> * gr-atsc (use Andrew Davis' fork)
>> * gr-dvbt
>> * gr-fecapi
>>
>> For your proposal I would recommend looking at their code, then
>> getting in contact with the author(s) of those modules to ask about
>> their thoughts on accelerating blocks they have written. The reality
>> of this project is that we are accelerating some signal processing
>> algorithm and knowledge of that algorithm is useful for acceleration.
>> Whatever application you have interested and/or knowledge in (fresh
>> out of a BS it's more likely to be interest) should guide your
>> proposal. If you know anything about error correcting codes then the
>> latter 2 would be good fits. OFDM frame detection probably has a
>> gentler learning curve since at the basic level you're looking at
>> convolution, and there's papers you can look for on more involved
>> algorithms. Other algorithms to look at might include agc or
>> equalizers.
>>
>> If you're interested in GPU programming don't forget to checkout gr-gpu.
>>
>>>
>>>>
>>>> At the moment the only mainstream ISA not being targeted is probably
>>>> AVX2, which has
>>>> some nice features for the type of kernels we're doing.  If you went
>>>> that route it would likely need add
>>>> protokernels to a pretty large number of kernels.
>>>>
>>>> Nathan
>>>
>>>This also seems to be promising, though I guess it would require me to
>>>come up to speed with AVX2 (which I would love to do). Could you
>>>please elaborate
>>>a little on the kind of beneficial features you have in mind ? I am
>>>concerned that the
>>>job of adding proto-kernels might turn out to be mundane/tedious ? Is
>>>that a valid concern ?
>>
>> Right, so as Martin mentioned the answer is sort of relative. I
>> wouldn't go so far as to say it's mundane, especially if you have
>> little 
>> experienhttp://gnss-sdr.org/documentation/google-summer-code-2014-ideas-listce
>>  with using intrinsics and SIMD instructions. One
>> reason AVX isn't so prominently featured (I suspect) is that the
>> instructions are almost the same as SSE instructions, but the vectors
>> are twice as long so that is actually mundane. AVX2/FMA extensions
>> introduce some new features to the amd64 instruction set. The most
>> obvious being that it looks like Intel and AMD finally settled in on
>> the same fused multiply-add (there's also a multiply-subtract that's
>> good for complex numbers) implementation. That will likely be able to
>> speed things up a bit, but I'm also looking forward to seeing gains
>> from the various load_gathers that have been introduced. They allow
>> you to do a single load operation that gathers vector elements that
>> span pretty large ranges. VOLK won't be so interested in the large
>> ranges (except maybe decimators), but it could be useful for loading
>> complex vectors. There's some other math functions we may be able to
>> leverage, but those are two features that I think would be widely
>> applicable.
>>
>> In your proposal you should definitely include what ISAs you intend to
>> use, and if there are features specific to that instruction set then
>> point out why it's a good choice. This is mostly important for
>> choosing between SSE and friends, AVX, AVX2/FMA. It would be good to
>> see plans that include NEON support for anything you'd add to amd64
>> platforms, but that's not a requirement.
>>
>>
>> Nathan
>
> I also see that GNSS-SDR made it to GSoC and they have a VOLK related project.
> http://gnss-sdr.org/documentation/google-summer-code-2014-ideas-list

Yeah, I also noticed that. I might submit a proposal to them also.

Abhishek



reply via email to

[Prev in Thread] Current Thread [Next in Thread]