[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Tesla C2000 series and CUDA and Gnu Radio

From: Thomas Hobiger
Subject: Re: [Discuss-gnuradio] Tesla C2000 series and CUDA and Gnu Radio
Date: Thu, 02 Dec 2010 09:00:13 +0900
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20101027 Fedora/3.0.10-1.fc12 Mnenhy/0.8.3 Thunderbird/3.0.10

Hi Marcus,

Actually we are doing all our processing on the GPU. We use GNURADIO to bring in the data and then run everything on the GPU and only copy back the results. As you wrote in your mail CPU-GPU resp. GPU-CPU copies can be bottlenecks and should be avoided or reduced whenever possible. With the current blocks it only makes sense to utilize the GPU if ALL blocks are also available for the GPU. E.g. if you do a cross-correlation (as we do) it does not make sense to do the FFTs on the GPU, but do the complex multiplication on the CPU as there is a data-transfer GPU-CPU-GPU before you can run the IFFTs. For us the GPU is the device which enables our application to run in real-time, which we could hardly achieve with the CPU. But getting the GPU into GNURADIO is another story, I guess...


On 12/01/2010 03:40 PM, Marcus D. Leech wrote:
Is anyone out there taking another look at CUDA + Gnu Radio?

Some of the couple-of-years-old charts I've looked at suggest that
speedups for some of the
   most important transforms we use vary between modest and disappointing.

Cross-over points for things like FFTs are usually up in the atmospheric
levels of FFT sizes before
   a CUDA-based transform would win even slightly against a
multi-threaded CPU-based FFTW, for
   example.  But that was a couple of years ago.  Anything new along
those lines?

It seems like the kinds of things that do well on a GPU are ones that
take a small amount of input
   data, compute ferociously, and produce modest amounts of output data.
Or schemes that might
   consume deluges of input data, but produce output data only
occasionally--a flow that did
   a bunch of FFTs and produced averaged mag-squared outputs only "once
in a while" might fare
   well on a GPU.

On a related note, has anyone looked at enabling the multi-threaded FFTW
stuff?  The cross-over
   points there (between FFTW in a single-thread and FFTW in
multiple-threads) seem to be lower-down
   on the FFT-size curve.

Dr. Thomas Hobiger
Space-Time Measurement Project
Space-Time Standards Group
New Generation Network Research Center
National Institute of Information and Communications Technology
4-2-1 Nukui-Kitamachi, Koganei
184-8795 Tokyo
email:  address@hidden
phone:  ++81-042-327-7561
fax:    ++81-042-327-6664
homepage (priv.): http://www.hobiger.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]