I'm seeing zero start-up costs for inner and outer products when
running ScalarBenchmark.apl.
===================== Mat1_IRC +.× Mat1_IRC ===============================
Benchmarking start-up cost for Mat1_IRC +.× Mat1_IRC ...
Length Sequ Cycles Para Cycles Linear Sequ Linear Para
====== =========== =========== =========== ===========
25 0 0 0 0
25 0 0 0 0
25 0 0 0 0
25 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
9 0 0 0 0
9 0 0 0 0
9 0 0 0 0
9 0 0 0 0
9 0 0 0 0
4 0 0 0 0
4 0 0 0 0
4 0 0 0 0
1 0 0 0 0
regression line sequential: 0 + 0×N cycles
regression line parallel: 0 + 0×N cycles
===================== Vec1_IRC ∘.× Vec1_IRC ===============================
Benchmarking start-up cost for Vec1_IRC ∘.× Vec1_IRC ...
Length Sequ Cycles Para Cycles Linear Sequ Linear Para
====== =========== =========== =========== ===========
25 0 0 0 0
25 0 0 0 0
25 0 0 0 0
25 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
16 0 0 0 0
9 0 0 0 0
9 0 0 0 0
9 0 0 0 0
9 0 0 0 0
9 0 0 0 0
4 0 0 0 0
4 0 0 0 0
4 0 0 0 0
1 0 0 0 0
regression line sequential: 0 + 0×N cycles
regression line parallel: 0 + 0×N cycles
But then in the summary section -- just above ]PSTAT -- I see:
-------------- Mat1_IRC +.× Mat1_IRC --------------
average sequential startup cost: 359 cycles
average parallel startup cost: 832 cycles
per item cost sequential: 0 cycles
per item cost parallel: 0 cycles
parallel break-even length: not reached
-------------- Vec1_IRC ∘.× Vec1_IRC --------------
average sequential startup cost: 359 cycles
average parallel startup cost: 832 cycles
per item cost sequential: 0 cycles
per item cost parallel: 0 cycles
parallel break-even length: not reached
Here the startup costs are nonzero, but the per-item costs are all
zero.
This doesn't look right... Or am I missing something?
In case it might shed some additional light, here's the final
section of the ]PSTAT output. The rest looks reasonable except for
epsilon-underbar, which reports all zeroes.
╔═════════════════╦════════════╤══════════╤══════════╤══════════╤══════════╗
║ Function ║ │ N │ ⌀ VLEN │ ⌀ cycles │ cyc÷VLEN ║
╟─────────────────╫────────────┼──────────┼──────────┼──────────┼──────────╢
║ f B overhead ║ 18446744003448130869 │ 283 │ 1993 │ 34818579233229 │ 17466187239 ║
║ A f B overhead ║ 18446743954621671206 │ 1114 │ 84 │ 1447585256996 │ 17221844259 ║
║ scalar B ║ 130198460 │ 283 │ 3873 │ 460065 │ 118 ║
║ A scalar B ║ 91680403 │ 1114 │ 949 │ 82298 │ 86 ║
║ clone B ║ 233950109373 │ 75391125 │ 131 │ 3103 │ 23 ║
║ A f.g B ║ 911702656227 │ 40046 │ 163 │ 22766385 │ 139671 ║
║ A ∘.g B ║ 9809803882 │ 121 │ 1000000 │ 81072759 │ 81 ║
║ A ⍴ B ║ 9071 │ 3 │ 27 │ 3023 │ 111 ║
║ PrintBuffer(B) ║ 135760049 │ 1168 │ 25 │ 116232 │ 4649 ║
╚═════════════════╩════════════╧══════════╧══════════╧══════════╧══════════╝