bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Boolean Packing


From: Elias Mårtenson
Subject: Re: [Bug-apl] Boolean Packing
Date: Fri, 21 Aug 2015 13:06:04 +0800

If you want to look at the code, all arrays are implemented as instances of Value. You will see references to Value_P as well, which is just a refcounted pointer to a Value. A Value has a Shape which defines its dimensions.

Code that processes array content does so using the method .get_ravel() and its various overloads.

I'm thinking that perhaps it would be possible to create an optimised primitive array by subclassing (or rather, extracting a common superclass) Value into something that can handle certain arrays much faster. Where that is actually faster depends on where the time is actually spent. If you still have to waste time boxing the values returned from .get_ravel() then there is little point.

The first step is to run your tests through cachegrind to determine exactly where the bottlenecks lie.

I have done a cachegrind analysis before, and I determined that most of the time was actually spent copying the arrays. The problem is that in an _expression_ such as A←A+1 where A is a large array, the entire content is copied, 1 is added to each element. However, since A is overwritten in the assignment a lot of time can be saved if the addition is done in-place. With an _expression_ such as A←⍉1+2×A you end up with three unnecessary copies and initialisations of the entire array. The time spent here is much larger than the time taken to actually perform the mathematical operations. I even implemented an optimisation for this which was rolled back because it was severely broken. However, when it worked it provided an order of magnitude performance improvement or more.

You might want to read up in the mailing list archives from a year or so ago where all of this was discussed in depth.

Regards,
Elias

On 21 August 2015 at 12:49, Mike Duvos <address@hidden> wrote:
Hi Elias,

It seems that there would be a substantial performance hit using C++ object management to construct your workspace.  I haven't run any benchmarks before, but perhaps a quick comparison with APL2 would be useful at this point.

Let's make a little function that does something 100 times.

      ∇DO100[⎕]∇
    ∇
[0]   DO100 X;I
[1]   I←¯1
[2]  L1:→((I←I+1)≥100)/0
[3]   ⍎X
[4]   →L1

    ∇ 2015-08-20 21.00.29.850 (GMT-7)

      

And another little function that prints out the number of seconds something takes.

      ∇TIME[⎕]∇
    ∇
[0]   TIME X;TS
[1]   TS←⎕TS
[2]   ⍎X
[3]   (⍕(24 60 60 1000⊥¯4↑⎕TS-TS)÷1000),' Seconds.'

    ∇ 2015-08-20 21.01.09.470 (GMT-7)

      

And an array of a million floats.

      A←1000 1000⍴⍳1E6

And a 100x100 boolean identity matrix

      B←100 100⍴101↑1=1

[IBM APL2]

      TIME  'DO100 ''A←⍉A'''
2.391 Seconds.

      TIME 'DO100 ''B←B∨.∧B'''
7.591 Seconds.

      TIME 'DO100 ''C←≠\B'''
0.197 Seconds.
     
[GNU APL]

      TIME    'DO100 ''A←⍉A'''
54.987 Seconds.

      TIME 'DO100 ''B←B∨.∧B'''
13.325  Seconds

      TIME 'DO100 ''C←≠\B'''
59.361 Seconds.
     
That's actually not too horrible, considering what it's having to do. 

Regards,

Mike




reply via email to

[Prev in Thread] Current Thread [Next in Thread]