I made a test implementation of a performance optimisation for GNU APL. The short story is that it seems to work for limited tests, but there may be issues with my approach and I don't want to spend any more time on it unless Jürgen agrees it's a good idea. After all, I may have completely misunderstood the architecture of GNU APL and my approach may be fundamentally broken.
When doing some benchmarking I noticed a lot of time was spent allocating, initialising and freeing Value instances. I realised that many such allocations are not actually needed. Take the following _expression_ for example:
In this case, the following things happen:
- The multiplication function (implemented by eval_skalar_AB) creates a new array and fills it with the result of the operation (in this case, bif_multiply) on each cell.
- The addition function is then called, resulting in an identical flow as in the multiplication above.
- The ravel function is then called (Bif_COMMA::ravel), resulting in yet another array being created which is simply copied from B. Note that the ravel is identical between Z and B in this case.
From a performance perspective, I saw a particular low-hanging fruit in the ravel call in the last point. Since the value that came from the addition is never used anywhere else, one can simply change the shape of B in-place and return it immediately.
My experiment involved creating a new VF (value flag) value; VF_temp that indicates that the value may be reused. This flag is then set for all values that are only used once (for example, the return value from a primitive or user function). If a value needs to be preserved, it is copied, and the flag is not set.
What all of this means is that values that are returned from a function, can be reused if it makes sense. There are several cases where this is the case. For example:
- The comma function (as discussed above)
- Reshape, especially when the shape of Z matches that of B
- Scalar functions. If the sizes of the arguments of plus, for example, matches, then A or B can be reused if they are marked as temporary.
- The each operator can (maybe?) push the result straight into the source array.
I've only done very basic testing, but the ravel operator on a large array went from taking a second or so to pretty much zero. Same should be true for reshape, although that one crashes for me right now. :-) Scalar functions still have to process the elements, so their timing can't go to zero, but at least the should save the time spent setting up the result array and creating new values.
Please let me know what you think.