On 05/23/2017 07:42 PM, John W. Eaton
wrote:
On
05/23/2017 01:12 PM, Mike Miller wrote:
On Tue, May 23,
2017 at 12:16:07 -0400, John W. Eaton wrote:
On 05/23/2017
04:37 AM, Mike Miller wrote:
Confirmed
here. I bisected and found a lot of performance loss
starting
with c452180ab672. Its immediate predecessor f4d4d83f15c5
has about the
same performance as 4.2.1. If you can compare those two
revisions and
confirm, that's a good place to start looking for a cause.
What was your test for performance here?
I recall timing "make check" when I made those changes and did
not see a
significant change in performance.
If I have something to test, I'll take a look at it.
I ran Dmitri's test case a handful of times at each build
revision. I
get a distinct difference between f4d4d83f15c5 and c452180ab672,
all
other things being equal. I'm using OpenBLAS instead of ATLAS.
I ran multiple Octave sessions with -cli -W, built without Qt to
speed
up bisecting, using the test case "x = rand(4000); tic; x'*x;
toc".
f4d4d83f15c5: mean is 0.63071 seconds, std dev is 0.0024187.
c452180ab672: mean is 1.1713 seconds, std dev is 0.11803.
This is the test case that I used to bisect and the results
stayed
consistent and converged on this revision.
Thanks, it should be fixed now with the latest two changesets that
I pushed.
The implementation of the compound binary _expression_ object is a
bit tricky and I made a mistake when I translated the rvalue1
operation to a tree_evaluator::visit* function.
I'm sure the reason that I didn't see anything significant in my
tests was that I only looked at the overall performance of running
the test suite, not any one operation individually. I wasn't
expecting much difference in performance in each evaluation step.
I was more concerned with whether using stack objects to hold
function results would perform worse than returning values from
the rvalue functions.
jwe
I have done some comparisons between 4.0.3 and the current dev
be69ea3de7a3 tip @ (also some previous devs)
and typically I see:
4.3.0+
test 2: cputime used: 9.2e-01 seconds
4.0.3 /usr/bin/octave --no-gui
test 2: cputime used: 6.4e-01 seconds
Initially, I was checking Rik's conversion of the elementary
functions to C++ std (which seem to be all
alright) but I noticed the large timing difference. The code that I
used spends most of its time transforming
complex-valued arrays using exp(), atanh(), etc. Since I ran some
tests prior to Rik's new code, it appears
that the cause is not the new std functions.
Michael Godfrey
|