Il giorno mer 27 mag 2015 alle ore 16:29 Marcus Müller <address@hidden
> ha scritto:
Yes. The Thread-per-Block-scheduler gets its name from the fact that
every block gets its own thread.
You should really use it -- using STS will probably kill the performance
you can gain by accelerating stuff on a GPU, because nearly no one uses
single-core CPUs any more, and to my knowledge, only the FFT blocks are
I hope to bring on CUDA the more time-consuming blocks..for the moment FIR and IIR filter, and FFT..
I'm a bit surprised that CUDA requires you to run everything in one
thread -- doesn't using cudaSetDevice in every thread (==in every
block's work() method on the first call) suffice?
NVidia claims CUDA is thread safe, i.e. worst case your multi-threading
performance is as bad as doing everything in a single thread.
> I'd like that other blocks(not related with CUDA) can run in parallel
That's really awesome because it scales so well :)
It's my fault: I've reason, from CUDA 4.0 the same GPU can be shared between different threads/processes..I got formation from older books :) Thank you!!
Now I need to change a litte my code..I was not so concerned about this capability because my blocks just launch asynchronously kernels/memcpy and exit..
It seems that with multithread I'm able to reduce "pause time" between execution of kernels of differents blocks from 1-3microseconds to 0.3microseconds..not bad!!(each block waits the previous with streams attached to events).
The drawback is that each kernel execution must be bigger,because otherwise its execution wouldn't hide the bigger overhead due to "reasignment of device" to a different thread.
Thanks for your replies,