|Subject:||Re: [Discuss-gnuradio] Using volk|
|Date:||Wed, 08 Oct 2014 11:56:46 +0200|
|User-agent:||Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0|
Hi Mostafa, |
VOLK is but an accelerated Library of Vector Optimized Kernels.
What you want is basically three operations:
a) finding maximum absolute
b) finding average absolute
c) dividing these two values
Now, looking closer at a) and b), one notices that both require the samples to be converted to their magnitudes, first. And because we're in the business of optimizing things, let's just use the squared magnitude, because that's faster to compute by one sqrt, usually. So this boils down to
a) take mag_squared of input (length N)
b1) find maximum of a)
b2) find sum of a)
As you can see, c) is not a vector operation, and thus not a case for volk.
For a) ("Complex to Mag ^ 2") there is a GNU Radio block that uses VOLK. That's the example for using VOLK that I would have recommended to read, anyway :)
In other terms: If you don't have to write your own highly optimized block, don't use VOLK directly, use the standard GNU Radio blockset. It's rather optimized ;)
Now, for the maximum search b1, things are a bit more complicated. Searching for a maximum is not *easily* vectorizable, because it is a inherently sequential operation (think of it as the first step of a bubble sort).
Now, you can achieve *awesome* performance by basically turning your linear search into a N-ary tree, with N being the order of parallelism you can achieve by using a maximum-finding SIMD instruction. But that requires the size of the problem to be a power of N. That just doesn't fly well with the usually more "multiple of 64 bit"-typey alignment restrictions.
You're however, highly encouraged to try just that: use the existing volk_32f_x2_max_32f, which compares two vectors, and stores the element-wise maximum in a third one, to compare the first with the second half of your mag_squared vector, and repeat the same with the first and second half of the result (and so on) until you have a single maximum value. That's the comparison tree from above for the N=2 case. You can employ clever overlapping to use as many values twice in the input to virtually extend your input's length to a power of two, and then just waltz on.
For b2) you can simply use the "integrate" block, which is not VOLK optimized (possibly because it's a gengen template and these are *so much fun* to specialize). But seeing as it is simply an accumulating for loop, I kind of expect your compiler to make the best of the situation. However, you can also use the volk_32f_accumulator_s32f VOLK kernel. I kind of want to use that in integrate, because for my machine, the SSE VOLK kernel is 4 times as fast as the generic implementation, which nicely matches the 4-operand SSE SIMD instruction behind it.
On 07.10.2014 21:49, Mostafa Alizadeh wrote:
Hello all, I wondered about volk. I want it to compute mean to peak value of a complex array. How could I do this? Besides, I really need to know is there any example of using volk? The code itself, doesn't reflect input and output parameters explicitly. Best, Mostafa
|[Prev in Thread]||Current Thread||[Next in Thread]|