bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Boolean Packing


From: Juergen Sauermann
Subject: Re: [Bug-apl] Boolean Packing
Date: Fri, 21 Aug 2015 18:40:56 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

Hi,

in the good old days (= long before APL2) the data type was a property of the value and memory was
always too small. At that time all values were homogeneous.

Then APL2 came and most vendors of old APL simply added a new thing called a mixed value which would
take care of the non-homogeneous cases (and hoping there would not be that many).

GNU APL came long after APL and when thinking about using homogeneous arrays, I came to the
conclusion that they have no place in a new, object oriented design. In other words, GNU APL considers
mixed arrays as the normal case rather than the exception.

I would also argue that the speed advantage is very minor:
- it only applies to ∧ ∨ ⍲ ⍱ and monadic ∼ and only if the arrays are homogeneous (for ∧ and ∨ they do
 not have to be).
- most of the time spent in APL execution is for checking various things as opposed to the pure computation
 of results (read: new ravel elements).
- homogeneous arrays often trigger re-computation if the entire value; For example:
  Z←1000000⍴0 ◊ Z[2]←2 ◊ Z←4]←'a'

And even though you may gain a few microseconds when computing the above functions, you pay them
back when checking for the homogeneous case all over the place. In other words, adding homogeneous
arrays will most likely fire back in terms of performance.

/// Jürgen



On 08/21/2015 07:06 AM, Elias Mårtenson wrote:
If you want to look at the code, all arrays are implemented as instances of Value. You will see references to Value_P as well, which is just a refcounted pointer to a Value. A Value has a Shape which defines its dimensions.

Code that processes array content does so using the method .get_ravel() and its various overloads.

I'm thinking that perhaps it would be possible to create an optimised primitive array by subclassing (or rather, extracting a common superclass) Value into something that can handle certain arrays much faster. Where that is actually faster depends on where the time is actually spent. If you still have to waste time boxing the values returned from .get_ravel() then there is little point.

The first step is to run your tests through cachegrind to determine exactly where the bottlenecks lie.

I have done a cachegrind analysis before, and I determined that most of the time was actually spent copying the arrays. The problem is that in an _expression_ such as A←A+1 where A is a large array, the entire content is copied, 1 is added to each element. However, since A is overwritten in the assignment a lot of time can be saved if the addition is done in-place. With an _expression_ such as A←⍉1+2×A you end up with three unnecessary copies and initialisations of the entire array. The time spent here is much larger than the time taken to actually perform the mathematical operations. I even implemented an optimisation for this which was rolled back because it was severely broken. However, when it worked it provided an order of magnitude performance improvement or more.

You might want to read up in the mailing list archives from a year or so ago where all of this was discussed in depth.

Regards,
Elias

On 21 August 2015 at 12:49, Mike Duvos <address@hidden> wrote:
Hi Elias,

It seems that there would be a substantial performance hit using C++ object management to construct your workspace.  I haven't run any benchmarks before, but perhaps a quick comparison with APL2 would be useful at this point.

Let's make a little function that does something 100 times.

      ∇DO100[⎕]∇
    ∇
[0]   DO100 X;I
[1]   I←¯1
[2]  L1:→((I←I+1)≥100)/0
[3]   ⍎X
[4]   →L1

    ∇ 2015-08-20 21.00.29.850 (GMT-7)

      

And another little function that prints out the number of seconds something takes.

      ∇TIME[⎕]∇
    ∇
[0]   TIME X;TS
[1]   TS←⎕TS
[2]   ⍎X
[3]   (⍕(24 60 60 1000⊥¯4↑⎕TS-TS)÷1000),' Seconds.'

    ∇ 2015-08-20 21.01.09.470 (GMT-7)

      

And an array of a million floats.

      A←1000 1000⍴⍳1E6

And a 100x100 boolean identity matrix

      B←100 100⍴101↑1=1

[IBM APL2]

      TIME  'DO100 ''A←⍉A'''
2.391 Seconds.

      TIME 'DO100 ''B←B∨.∧B'''
7.591 Seconds.

      TIME 'DO100 ''C←≠\B'''
0.197 Seconds.
     
[GNU APL]

      TIME    'DO100 ''A←⍉A'''
54.987 Seconds.

      TIME 'DO100 ''B←B∨.∧B'''
13.325  Seconds

      TIME 'DO100 ''C←≠\B'''
59.361 Seconds.
     
That's actually not too horrible, considering what it's having to do. 

Regards,

Mike





reply via email to

[Prev in Thread] Current Thread [Next in Thread]