[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] [VOLK] GPU acceleration -> OpenCL integration?

From: Stefan Wunsch
Subject: Re: [Discuss-gnuradio] [VOLK] GPU acceleration -> OpenCL integration?
Date: Fri, 18 Dec 2015 01:10:39 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0


You are completely right, that's the point. The matrix is of the size
1000x1000 and it is faster than the generic implementation above 500x500
(just a rough estimate). Most use-cases in gnuradio do not exploit this

But, if you want to promote VOLK outside the gnuradio context, this
feature is quite unique. As far as I know, the SIMD support of OpenCL is
pretty bad (I talk of the CPU frontend) and VOLK could combine a proper
SIMD use with GPU acceleration.

Nevertheless, I think there are some efficient encoder/decoder
algorithms for GPUs, which could make use of such an integration.


On 12/17/2015 07:14 PM, Sylvain Munaut wrote:
> Hi,
>> RUN_VOLK_TESTS: volk_32f_x2_matrix_nxn_multiply_puppet_32f(1000000,10)
>> generic completed in 28482ms
>> a_opencl completed in 13364.3ms
> Question is how does that number change for smaller problem sizes ?
> And what would be the average problem size encountered in real env.
> For SIMD optimization the result of "who's the fastest" doesn't vary
> too much depending on problem size because they don't have much setup
> / teardown size.
> For OpenCL I very much doubt that would be the case and if you end up
> with an app making a lot of "smallish" (and given the default buffer
> size of GR, I feel the calls to volk aren't processing millions of
> samples at a time in a single call)
> Cheers,
>     Sylvain

reply via email to

[Prev in Thread] Current Thread [Next in Thread]