Re: [Discuss-gnuradio] Bidirectional communication between attached bloc
Re: [Discuss-gnuradio] Bidirectional communication between attached blocks
Mon, 20 Apr 2015 16:21:32 +0200
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0
If I may recommend something, it would be having a look at VOLK .
It's the optimizations library that comes with GNU Radio.
If you could implement some of these algorithms in CUDA, then every
block currently using VOLK (which is the majority of the
arithmetically challenging blocks at the moment) could automatically
make use of your accelerations, without having to change anything!
Also, VOLK comes with volk_profile, which it uses to test the
different implementations that work on your hardware, looking for
the fastest one. That would be the ultimate benchmark for your
kernels, as it directly compares the efficiency of the "general C"
and CPU-SIMD implementations to your CUDA kernels.
Furthermore, gr-theano is worth a visit , because it actually
does CUDA to accellerate channel models. The point here is that GPUs
and their high memcpy latency (and CPU cost) aren't practical for
all problems. If I just want to add a small number of samples, doing
it on a CPU might simply pay off better; gr-theano for example
offers a FFT, which might be one of the algorithms typically working
on large vectors where the CPU/GPU boundary crossing might be worth
For my thesis,I'm trying do bring various part of GnuRadio
My idea is to rewrite already existing blocks with CUDA,
possibly without breaking compatibility with actual
implementation of gnuradio. In this way a normal user can use
these blocks without problems.
For the moment, I've token more confidence with gnuradio, made
an FM CUDA receiver and started to port over CUDA some blocks.
Is mandatory to minimize host-device memcpy.
My actual approach is : each block loads its code and
communicate with neighboors using async transfers,streams and
other(so I need to pass addresses of memory locations,lock
My next step will be: at the beginning,each block will send
down its device code and parameters..the block at the and of
the chain will make a dynamic compilation (CUDA 7).. if I'll
have additional time I'll also use warp parallelism(reducing
Thanks in any case,
Il giorno lun 20 apr 2015 alle ore
12:48 Marcus Müller <address@hidden>
I just realized: Things might be much more easy than
What you do sounds like a job for a hierarchical block;
if you're not used to that concept: It's just a
"subflowgraph", represented as a block with in- and
If you put both your blocks inside, you'll always have
them together. And: in the constructor of your
hierarchical block, you can for example first construct
your cuda block, and then give your "downstream" block
the pointer to that in its constructor.
To the user, this will look like one block, though there
are two (or more) inside.
On 04/20/2015 12:29 PM, marco Ribero wrote:
Thank you very much. Your solution is much cleaner.
Have a good day,
Il giorno lun 20 apr 2015
alle ore 09:29 Marcus Müller <address@hidden>
what you describe as ID already exist: every
block has a function alias(), giving it a
string "name", which can be used with
You will need to wrap your alias in a
pmt::intern to get it into a stream tag, so
use that with block_lookup, and cast the
result to your_block_type::sptr.