Thank you Andy. However, I only need the division, although this is indeed a good idea if more operations were needed.
So far, I've applied the following lines with some significant savings (w.r.t. a loop):
volk_32fc_x2_multiply_conjugate_32fc(&c, &a, &b, N); // c = a*conj(b)
volk_32fc_magnitude_squared_32f(&mag_sq_b, &b, N); // mag_sq_b = |b|^2
volk_32f_x2_divide_32f(&inv_mag_sq_b, &ones, &mag_sq_b, N); // inv_mag_sq_b = 1/|b|^2, since I've previously defined ones as an array containing N ones.
volk_32fc_32f_multiply_32fc(&out, &c, &inv_mag_sq_b, N); // out = c*inv_mag_sq_b = a*conj(b)/|b|^2 = a/b
The idea of using VOLK's pow operator is significantly slower.
I've also experienced interesting performance improvements by simplifying some for loops not amenable to VOLK, as suggested by Marcus. On the other hand, I'm crazy enough to try to implement a VOLK kernel that performs the division. I've just started, don't know if I'll be successful, but guess I'll learn something anyhow.