I attempted to "increase" the GPU performance by inserting very large floating point numbers as parameters to cuda.multiply_const_ff and also messing around taps which is declared by:
But in doing so, I assume that I am passing in "more work" to be done so the GPU should be faster, but it is not. the CPU still takes fractions of a second to complete (with large floating points) while the GPU takes a little over 1 second.
- Following this thread:
http://lists.gnu.org/archive/html/discuss-gnuradio/2009-01/msg00378.html
I would like to approach the problem by increasing computation
intensity, thats why I am changing the benchmark parameters, but it
doesnt seem to work, Am I approaching this correctly?
- From this thread:
http://lists.gnu.org/archive/html/discuss-gnuradio/2008-11/msg00292.html
If I benchmark a single block with a big output_multiple then I do see
performance increases.
How do I do the above? How have the experts (Martin, Achilleas) been able to tweak the performance of CUDA-Enabled GNURadio to show that GPU computing can indeed be faster?
- Is there anyway to measure the time the memory calls to and from CPU and CUDA? This way we can know what exactly is the overhead.
Please help!!