On 06/19/2017 07:37 AM, Michael D
Godfrey wrote:
On 05/23/2017 07:42 PM, John W. Eaton
wrote:
On
05/23/2017 01:12 PM, Mike Miller wrote:
On Tue, May 23,
2017 at 12:16:07 -0400, John W. Eaton wrote:
On 05/23/2017
04:37 AM, Mike Miller wrote:
Confirmed
here. I bisected and found a lot of performance loss
starting
with c452180ab672. Its immediate predecessor f4d4d83f15c5
has about the
same performance as 4.2.1. If you can compare those two
revisions and
confirm, that's a good place to start looking for a cause.
What was your test for performance here?
I recall timing "make check" when I made those changes and
did not see a
significant change in performance.
If I have something to test, I'll take a look at it.
I ran Dmitri's test case a handful of times at each build
revision. I
get a distinct difference between f4d4d83f15c5 and
c452180ab672, all
other things being equal. I'm using OpenBLAS instead of ATLAS.
I ran multiple Octave sessions with -cli -W, built without Qt
to speed
up bisecting, using the test case "x = rand(4000); tic; x'*x;
toc".
f4d4d83f15c5: mean is 0.63071 seconds, std dev is 0.0024187.
c452180ab672: mean is 1.1713 seconds, std dev is 0.11803.
This is the test case that I used to bisect and the results
stayed
consistent and converged on this revision.
Thanks, it should be fixed now with the latest two changesets
that I pushed.
The implementation of the compound binary _expression_ object is a
bit tricky and I made a mistake when I translated the rvalue1
operation to a tree_evaluator::visit* function.
I'm sure the reason that I didn't see anything significant in my
tests was that I only looked at the overall performance of
running the test suite, not any one operation individually. I
wasn't expecting much difference in performance in each
evaluation step. I was more concerned with whether using stack
objects to hold function results would perform worse than
returning values from the rvalue functions.
jwe
I have done some comparisons between 4.0.3 and the current dev
be69ea3de7a3 tip @ (also some previous devs)
and typically I see:
4.3.0+
test 2: cputime used: 9.2e-01 seconds
4.0.3 /usr/bin/octave --no-gui
test 2: cputime used: 6.4e-01 seconds
Initially, I was checking Rik's conversion of the elementary
functions to C++ std (which seem to be all
alright) but I noticed the large timing difference. The code that
I used spends most of its time transforming
complex-valued arrays using exp(), atanh(), etc. Since I ran some
tests prior to Rik's new code, it appears
that the cause is not the new std functions.
Michael,
Thanks for noticing this. If the issue is a slow down in
complex-valued arrays then maybe you can re-test in about a week?
At the moment I am converting many of the basic mapper functions
which used to dispatch to gnulib, Fortran, or even our own
hand-rolled C++ code, to instead dispatch to the C++ standard
library. Besides making the code simpler, and reducing our external
dependencies during configure, Octave will now sit squarely atop the
standard library which is a well-debugged and well-coded piece of
software.
My next task, after the basic functions, is to look at how the
mapper functions are implemented for complex values. Currently, we
often hand code our own functions for complex values. However,
std::complex already includes templates for some of the basic math
functions. I would like to switch over to using the standard
templates whenever possible which might improve performance.
--Rik
|