bug-apl
[Top][All Lists]
 Hi, I can confirm that something is strange with the macro based ¨ but I can't yet see what. I wrote a simple measurement function to test A+B in different ways: A+B        (direct scalar) A+¨B       (uses built-in each) A {⍺+⍵} B  (uses macro each) The measurement program is this:       )clear ∇MEASURE B;A;D;R;T  A←⍳B ◊ T←7⍴0  T[1]←⎕FIO ¯1 ◊ ⊣ A + A              ◊ T[2]←⎕FIO ¯1 ◊ ⊣ A + A  T[3]←⎕FIO ¯1 ◊ ⊣ A +¨ A             ◊ T[4]←⎕FIO ¯1 ◊ ⊣ A +¨ A  T[5]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}¨ A         ◊ T[6]←⎕FIO ¯1 ◊ ⊣ A {⍺+⍵}¨ A  T[7]←⎕FIO ¯1 ◊ D←T-1⌽T ◊ R←¯1↓D÷D[3]  ⍉2 6⍴(,2/'A + B' 'A +¨ B' 'A {⍺+⍵}¨ B') , (⊂8 2)⍕¨R ∇ ∇Z←A (LO EACH_1) B;rho_Z;N;N_max rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0 LOOP: Z[N]←⊂(⊃A[N]) LO ⊃B[N] ◊ →(N_max>N←N+1)⍴LOOP Z←rho_Z⍴Z ∇ ∇Z←A (LO EACH_2) B;rho_Z;N;N_max rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0 LOOP: (N⌷Z)←⊂(N⌷A) LO N⌷B ◊ →(N_max>N←N+1)⍴LOOP ∇ Every case is measured twice to see fluctuations, caching and the like. The numbers are CPU cycles relative to the third line The results are quite puzzling:             MEASURE 100  A + B           .18  A + B           .14  A +¨ B         1.00  A +¨ B          .90  A {⍺+⍵}¨ B    17.81  A {⍺+⍵}¨ B    11.81             MEASURE 1000  A + B           .13  A + B           .12  A +¨ B         1.00  A +¨ B         1.01  A {⍺+⍵}¨ B    18.03  A {⍺+⍵}¨ B    11.67             MEASURE 10000  A + B           .18  A + B           .17  A +¨ B         1.00  A +¨ B          .29  A {⍺+⍵}¨ B   111.45  A {⍺+⍵}¨ B   111.85 So the macro based EACH becomes slower when the values grow. But why? The macro code for the EACH macro is quite simple (and ⎕FXing it gives the same behavior as the built-in macro.       ⎕CR 'μ-Z__vA_LO_EACH_vB' Z←A (LO Z__vA_LO_EACH_vB) B;rho_Z;N;N_max rho_Z←⍴B ◊ N←⎕IO ◊ A←,A ◊ N_max←N+⍴B←,B ◊ Z←N_max⍴0 LOOP: Z[N]←⊂(⊃A[N]) LO ⊃B[N] ◊ →(N_max>N←N+1)⍴LOOP Z←rho_Z⍴Z I can't see anything in that code which would explain why the macro performance drops when the values get larger. The difference between A +¨ B andA {⍺+⍵}¨ B is caused by two independent factors: one uses the built-in + while the other uses the user defined lambda {⍺+⍵}, and one uses the built-in ¨ while the other uses the user defined macro Z__vA_LO_EACH_vB The impact of the second factor can be determined by running the same measurement with an older version of GNU APL:             MEASURE 100   A + B           .20  A + B           .15  A +¨ B         1.00  A +¨ B          .94  A {⍺+⍵}¨ B     5.53  A {⍺+⍵}¨ B     5.36             MEASURE 1000  A + B           .14  A + B           .10  A +¨ B         1.00  A +¨ B          .98  A {⍺+⍵}¨ B     4.87  A {⍺+⍵}¨ B     4.49             MEASURE 10000  A + B           .26  A + B           .25  A +¨ B         1.00  A +¨ B          .42  A {⍺+⍵}¨ B     2.42  A {⍺+⍵}¨ B     2.41 Here the time decreases rather than increases with the vector lengths. Thus macros are about 3 times slower at vectors of length 100 and 30 times slower at vectors of length 10000. And the LOOP line in the macro does the same as the corresponding C code in the built-in EACH case. I have no idea yet what is going on here. The memory consumption is probably a hint, but I can't see where the memory goes. The A, B and Z variables are allocated only once per macro call and the variables in the loop are small (which means that GNU APL is recycling them internally rather than malloc()ing them). Any ideas welcome! /// Jürgen On 06/17/2016 12:46 AM, Xiao-Yong Jin wrote: ```Hi, In my script I was using ⍎¨ on a vector of character vector from each line of a file. After some testing, I guess it is not really the ⍎, that is slow. The time scaling is strange. In the following, the first row and the third row (using ¨ and ⍣ respectively) scales like n*3, while the problem size is only n*2. My script ends up taking more than a minute to read a 1000×1000 numeric matrix from a text file. On the other hand, the naive {⍵+1⊣⍎v}⍣n 1 indeed scales like n*2. In addition, gnu apl is taking nearly a gigabytes of memory to run the following code, while dyalog is on the order of megabytes. te¨200×⍳3 ⍎¨u 536 ⍎¨u 4129 ⍎¨u 13962 ⍎ v 0 ⍎ v 0 ⍎ v 0 ⍎u⍣ 518 ⍎u⍣ 4114 ⍎u⍣ 13889 ⍎v⍣ 15 ⍎v⍣ 58 ⍎v⍣ 131 ∇te[⎕]∇ ∇ [0] r←te n;s;u;v;t [1] s←1E¯18×?n n⍴1E18 [2] u←⍕¨⊂[2]s [3] v←↑u [4] t←⎕AI◊⊣⍎¨u◊r←(⊂'⍎¨u'),2⌷⎕AI-t [5] t←⎕AI◊⊣⍎v◊r←r,[.5](⊂'⍎ v'),2⌷⎕AI-t [6] t←⎕AI◊⊣{⍵+1⊣⍎⊃u[⍵]}⍣n 1◊r←r,[1](⊂'⍎u⍣'),2⌷⎕AI-t [7] t←⎕AI◊⊣{⍵+1⊣⍎v}⍣n 1◊r←r,[1](⊂'⍎v⍣'),2⌷⎕AI-t ∇ Best, Xiao-Yong ```