From: Juergen Sauermann
Date: Tue, 11 Mar 2014 16:07:33 +0100
Hi David,

looks good! Some comments, though.

1 .you could adapt src/testcases/Performance.pt with some longer
skalar functions in order to get some performance figures. You can start it like this:

./apl -T testcases/Performance.pt

2. I believe we should not bother the user with specifying parallelization parameters in ⎕SYL. I would rather ./configure CORES=n with n=1 meaning no parallel execution, CORES=auto being the number of cores on the build machine, and explicit numbers n>1 meaning that n cores shall be used. This would generate slightly faster code than computing array bounds
at runtime. Its a bit more hassle for the user, but may pay off soon.

3. Yes, GNU APL throws many exception (almost every APL error was thrown from somewhere), and I was excpecting that we have to catch them on the throwing processor. Not too difficult if
we do it on the top level.

4. It would be good to understand how the OPenMP loops work. I could imagined one of two strategies:

- in loop(j, MAX)   thread j executes iteration j, j+CORES, ...
- thread j executes iterations j*MAX/CORES ... (j+1)*MAX/CORES

The first strategy interleaves the data and is more intuitive
while the second uses blocks of data and is more cache-friendly and therefore probably
giving better performance.

5. Not sure if your earlier comment on letting the scheduler decide is correct. I have been doing pthread programming in the past and I have seen cases where the scheduler fooled itself and led to cases where the same problem took more than double the capacity compared to explicit affinity on a 4-core CPU. I would expect that APL generates very fine-graned and short-lived pieces of execution and the scheduler may not be optimized for that. I guess we have to try that out.

/// Jürgen

On 03/11/2014 08:02 AM, David B. Lamkins wrote:
Juergen's suggestion prompted me to attempt an implementation using
OpenMP rather than the by-hand coding that I had been anticipating.
Attached is a quick-and-dirty patch to enable GNU APL to be build with
OpenMP support.

./configure --with-openmp

There are many rough edges, both in the Makefile and the code.

--with-openmp would ideally check to see whether the compiler supports
OpenMP. It may be necessary to check the compiler version, as different
compilers support different versions of OpenMP. Also, I've assumed
compilation on/for Linux despite the fact that GNU APL and OpenMP should
be buildable with the right Windows compiler.

As one might expect, OpenMP requires that any throw from a worker thread
must be caught by the same thread. I'm almost certain that this
restriction could be violated by GNU APL code as currently written.

The good news, though, is that the changes are benign; in the absence of
--with-openmp, GNU APL's behavior is unchanged.

With OpenMP support, ⎕syl is extended to access some of OpenMPs

I've done only trivial testing at this point; just enough to verify that
compiling OpenMP support doesn't obviously break GNU APL.

I haven't confirmed that the OpenMP #pragmas on the key loops in
SkalarFunction.cc have any effect on execution time or processor core
utilization. I hope to do more testing later this week.

Best wishes,

