My current interpretation of various benchmarks that Elias
myself did some years ago is that the bandwidth of the memory
between the CPUs (or cores) and the memory is the limiting factor,
matter how efficient the APL interpreter is, this bottleneck will
speedup that can be achieved.
Makes sense. It is my understanding that CPU's are so much faster than any memory that memory can't even keep up with a single CPU. The only reason we see speed improvements is in small loops that can fit in cache. Long sequences, like a large array, can't even keep up with a single CPU. I guess machine architecture will have to catch up.