If you want to look at the code, all arrays are implemented as instances of
Value. You will see references to
Value_P as well, which is just a refcounted pointer to a
Value. A
Value has a
Shape which defines its dimensions.
Code that processes array content does so using the method .get_ravel() and its various overloads.
I'm thinking that perhaps it would be possible to create an optimised primitive array by subclassing (or rather, extracting a common superclass) Value into something that can handle certain arrays much faster. Where that is actually faster depends on where the time is actually spent. If you still have to waste time boxing the values returned from .get_ravel() then there is little point.
The first step is to run your tests through cachegrind to determine exactly where the bottlenecks lie.
I have done a cachegrind analysis before, and I determined that most of the time was actually spent copying the arrays. The problem is that in an _expression_ such as A←A+1 where A is a large array, the entire content is copied, 1 is added to each element. However, since A is overwritten in the assignment a lot of time can be saved if the addition is done in-place. With an _expression_ such as A←⍉1+2×A you end up with three unnecessary copies and initialisations of the entire array. The time spent here is much larger than the time taken to actually perform the mathematical operations. I even implemented an optimisation for this which was rolled back because it was severely broken. However, when it worked it provided an order of magnitude performance improvement or more.
You might want to read up in the mailing list archives from a year or so ago where all of this was discussed in depth.
Regards,
Elias