[Top][All Lists]
 Hi Federico, I don't know if that will help much, but: volk_32fc_magnitude_squared_32f(&mag_sq_b[0], &b[0], N); // mag_sq_b = |b|^2 Maybe doing it in-place, i.e. volk_32fc_magnitude_squared_32f(&[0], &b[0], N); // b = |b|^2 might be even faster; just don't forget that you're then treating the first half of b as floats instead of complexes. I just realized there's the __mm_rcp_ps SSE1 intrinsic... maybe that complex/complex VOLK kernel is closer than I thought. Cheers, Marcus On 13.05.2016 20:59, Federico Larroca wrote: Thank you Andy. However, I only need the division, although this is indeed a good idea if more operations were needed. So far, I've applied the following lines with some significant savings (w.r.t. a loop): volk_32fc_x2_multiply_conjugate_32fc(&c[0], &a[0], &b[0], N); // c = a*conj(b) volk_32fc_magnitude_squared_32f(&mag_sq_b[0], &b[0], N); // mag_sq_b = |b|^2 volk_32f_x2_divide_32f(&inv_mag_sq_b[0], &ones[0], &mag_sq_b[0], N); // inv_mag_sq_b = 1/|b|^2, since I've previously defined ones as an array containing N ones. volk_32fc_32f_multiply_32fc(&out[0], &c[0], &inv_mag_sq_b[0], N); // out = c*inv_mag_sq_b = a*conj(b)/|b|^2 = a/b The idea of using VOLK's pow operator is significantly slower. I've also experienced interesting performance improvements by simplifying some for loops not amenable to VOLK, as suggested by Marcus. On the other hand, I'm crazy enough to try to implement a VOLK kernel that performs the division. I've just started, don't know if I'll be successful, but guess I'll learn something anyhow. best Federico 2016-05-13 15:14 GMT-03:00 Andy Walls : On Thu, 2016-05-12 at 16:24 -0400, address@hidden wrote: > Date: Wed, 11 May 2016 16:09:56 -0300 > From: Federico Larroca > To: address@hidden > Subject: [Discuss-gnuradio] VOLK division between complexes > Hello everyone, > We are on the stage of optimizing our project (gr-isdbt). One of the > most consuming blocks is OFDM synchronization, and in particular the > equalization phase. This is simply the division between the input > signal and the estimated channel gains (two modestly big arrays of > ~5000 complexes for each OFDM symbol). > Until now, this was performed by a for loop, so my plan was to change > it for a volk function. However, there is no complex division in VOLK. > So I've done a rather indirect operation using the property that a/b = > a*conj(b)/|b|^2, resulting in six lines of code (a multiply conjugate, > a magnitude squared, a deinterleave, a couple of float divisions and > an interleave). Obviously the performance gain (measured with the > Performance Monitor) is marginal (to be optimistic)... > Does anyone has a better idea? I have a different idea, but I doubt it is better.  The transformation w = Log (z) = ln|z| + jArg(z) transforms multiplication, division, power and root operations into addition, subtraction, multiplication and division  operations respectively. So if c = Log(a), d = Log(b), then a/b = Exp(c-d) . If along with your complex division, you also have a lot of additional complex multiplcation, power, and/or (real) root operations to perform, then the transform *might* give you a savings.  A savings would also be more likely, if you don't need to invert the transformation at the end (i.e. no need for z = Exp(w)). Regards, Andy >  Implementing a new kernel is simply out of my knowledge scope. > Best > Federico ```_______________________________________________ Discuss-gnuradio mailing list address@hidden https://lists.gnu.org/mailman/listinfo/discuss-gnuradio ```