octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: octave dev slow down


From: Rik
Subject: Re: octave dev slow down
Date: Mon, 19 Jun 2017 08:32:14 -0700

On 06/19/2017 07:37 AM, Michael D Godfrey wrote:


On 05/23/2017 07:42 PM, John W. Eaton wrote:
On 05/23/2017 01:12 PM, Mike Miller wrote:
On Tue, May 23, 2017 at 12:16:07 -0400, John W. Eaton wrote:
On 05/23/2017 04:37 AM, Mike Miller wrote:

Confirmed here. I bisected and found a lot of performance loss starting
with c452180ab672. Its immediate predecessor f4d4d83f15c5 has about the
same performance as 4.2.1. If you can compare those two revisions and
confirm, that's a good place to start looking for a cause.

What was your test for performance here?

I recall timing "make check" when I made those changes and did not see a
significant change in performance.

If I have something to test, I'll take a look at it.

I ran Dmitri's test case a handful of times at each build revision. I
get a distinct difference between f4d4d83f15c5 and c452180ab672, all
other things being equal. I'm using OpenBLAS instead of ATLAS.

I ran multiple Octave sessions with -cli -W, built without Qt to speed
up bisecting, using the test case "x = rand(4000); tic; x'*x; toc".

f4d4d83f15c5: mean is 0.63071 seconds, std dev is 0.0024187.

c452180ab672: mean is 1.1713 seconds, std dev is 0.11803.

This is the test case that I used to bisect and the results stayed
consistent and converged on this revision.

Thanks, it should be fixed now with the latest two changesets that I pushed.

The implementation of the compound binary _expression_ object is a bit tricky and I made a mistake when I translated the rvalue1 operation to a tree_evaluator::visit* function.

I'm sure the reason that I didn't see anything significant in my tests was that I only looked at the overall performance of running the test suite, not any one operation individually.  I wasn't expecting much difference in performance in each evaluation step.  I was more concerned with whether using stack objects to hold function results would perform worse than returning values from the rvalue functions.

jwe

I have done some comparisons between 4.0.3 and the current dev be69ea3de7a3 tip @ (also some previous devs)
and typically I see:

4.3.0+
test 2: cputime used: 9.2e-01 seconds

4.0.3   /usr/bin/octave --no-gui
test 2: cputime used: 6.4e-01 seconds

Initially, I was checking Rik's conversion of the elementary functions to C++ std (which seem to be all
alright) but I noticed the large timing difference.  The code that I used spends most of its time transforming
complex-valued arrays using exp(), atanh(), etc. Since I ran some tests prior to Rik's new code, it appears
that the cause is not the new std functions.

Michael,

Thanks for noticing this.  If the issue is a slow down in complex-valued arrays then maybe you can re-test in about a week?  At the moment I am converting many of the basic mapper functions which used to dispatch to gnulib, Fortran, or even our own hand-rolled C++ code, to instead dispatch to the C++ standard library.  Besides making the code simpler, and reducing our external dependencies during configure, Octave will now sit squarely atop the standard library which is a well-debugged and well-coded piece of software.

My next task, after the basic functions, is to look at how the mapper functions are implemented for complex values.  Currently, we often hand code our own functions for complex values.  However, std::complex already includes templates for some of the basic math functions.  I would like to switch over to using the standard templates whenever possible which might improve performance.

--Rik


reply via email to

[Prev in Thread] Current Thread [Next in Thread]