|Subject:||Re: [Discuss-gnuradio] Which blocks do you like?|
|Date:||Mon, 18 May 2015 09:52:26 +0200|
|User-agent:||Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0|
I've looked about correlate_and_sync method(are you speaking about it,isn't it?)Pretty much, yeah :)
here the problem is not only data outgoing from the block,but relative tags.also because for the moment I'm not supporting tags on the flow above CUDA.If u have good idea about how manage them,feel free to speak.I don't know exactly what your blocks look like, but assuming you take a gr::sync_block and let it handle the whole CUDA thing, GNU Radio would automatically make this work.
If you instead have a block that "sends" data to the CUDA GPU (a sink from GNU Radio's perspective), and another one that "receives" data from CUDA (a source from the perspective of GNU Radio), you could simply get_tags_in_range on the sink block, take these tags, and send them as message to your source block (before you start the CUDA processing). You would then write a message handler in your source block which just takes these tags and does add_item_tag to each of them.
If you want to add tags by detecting things in CUDA, that'll be a bit more complicated, and sounds like your CUDA threads would need to coordinate with a lot of barriers to write the index of a tag you want to add into a specific memory position... I don't know how effective that would be compared to doing the same on a GHz CPU.
I agree, relying on the signal to limit execution of this loop feels a bit "dangerous", but I also agree that as long as your "known symbol" has a sane shape, this won't be a problem. The problem is that this is indeed a bit of a CPU-hungry bit of code, so every additional condition might slow things down; I think it should be possible to come up with a nearly-as-effective implementation that is more secure, but I can't think of one right now. Any hints? 
At any rate, the outer `while` shouldn't run till `noutput_items`, but till `noutput_items - 1`, because of the inner `while(corr_mag[i] < corr_mag[i+1])`. Thanks for spotting that! If you don't mind, I'd like to ask you to fix that line (`while(i<noutput_items)`->`while(i<noutput_items-1)`) and submit a pull request on github  (if possible, base it off the "maint" branch).
 Hard to properly optimize; intuitively, it's not clear whether a highly optimized "precompute the complete corr_mag[0:end-d_sps]-corr_mag[d_sps:end] and sign(corr_mag[:end-1]-corr_mag[1:end])" would be faster than selectively only computing the relevant parts (like this branching-intense algorithm does) on a CPU. Pretty sure that on CUDA, you'd just precompute the whole arrays.
On 05/17/2015 10:58 PM, marco Ribero wrote:
|[Prev in Thread]||Current Thread||[Next in Thread]|