|Subject:||Re: [Bug-apl] Experimental OpenMP patch|
|Date:||Wed, 12 Mar 2014 12:18:12 +0100|
|User-agent:||Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130330 Thunderbird/17.0.5|
I believe we should first find out how big the thread dispatch effort actually is.
Because coalescing can also fir back by creating unequally distributed intermediate results.
For skalar functions you have a parallel eecution time of:
a + b×⌈N÷P where a = startup time (thread dispatch and clean-up), b = cost per cell, N = data size, and P = core count.
In eg. A + B + C coalescing would reduce the time from 2×(a + b×⌈N÷P) to a + 2 ×(b×⌈N÷P)
On the other hand in A + B ⍴ C things could be completely different because ⍴ can create a very unevenly sized right
argument of +.
I guess we have to look into the details of every function and operator to see what can be done in terms of parallel execution.
Starting with skalar functions seems to be a good strategy and I believe we should finish that first before looking into
more complex scenarios.
On 03/11/2014 04:24 PM, Elias Mårtenson wrote:
|[Prev in Thread]||Current Thread||[Next in Thread]|