On Tue, May 23, 2017 at 12:16:07 -0400, John W. Eaton wrote:
On 05/23/2017 04:37 AM, Mike Miller wrote:
Confirmed here. I bisected and found a lot of performance loss starting
with c452180ab672. Its immediate predecessor f4d4d83f15c5 has about the
same performance as 4.2.1. If you can compare those two revisions and
confirm, that's a good place to start looking for a cause.
What was your test for performance here?
I recall timing "make check" when I made those changes and did not see a
significant change in performance.
If I have something to test, I'll take a look at it.
I ran Dmitri's test case a handful of times at each build revision. I
get a distinct difference between f4d4d83f15c5 and c452180ab672, all
other things being equal. I'm using OpenBLAS instead of ATLAS.
I ran multiple Octave sessions with -cli -W, built without Qt to speed
up bisecting, using the test case "x = rand(4000); tic; x'*x; toc".
f4d4d83f15c5: mean is 0.63071 seconds, std dev is 0.0024187.
c452180ab672: mean is 1.1713 seconds, std dev is 0.11803.
This is the test case that I used to bisect and the results stayed
consistent and converged on this revision.