[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
execution speed in *oct files
From: 
John W. Eaton 
Subject: 
execution speed in *oct files 
Date: 
Mon, 7 Jun 1999 20:34:35 0500 (CDT) 
On 4Jun1999, John W. Eaton <address@hidden> wrote:
 On 3Jun1999, A+A Adler <address@hidden> wrote:

  I just submitted an oct file for 2D convolutions
  to octavesources.
 
  While I was writing it, I noticed that the standard octave
  syntax seems to be very inefficient.
[code examples elided]
  Why are pointers so much more efficient than
  matrix refernces?

 Because the indexing operators invoke some functions to do the
 indexing calculation. I'd guess that either the functions are not all
 inlined, or the compiler is not able to convert the addition and
 multiplication of the indexing operation to a simple increment (as you
 have), or both.
I've done a few quick tests using the following code. It simply adds
two real matrices and returns the result. The first two arguments are
the matrices to add. The third argument controls the code block used
to perform the additions, and the last argument says how many times to
perform the addition (to ensure that it takes long enough to get
meaningful timing results).
#include <octave/oct.h>
DEFUN_DLD (foo, args, ,
"")
{
octave_value_list retval;
int nargin = args.length ();
int n = (nargin > 2)
? static_cast<int> (args(2).double_value ()) : 0;
int max_try = (nargin == 4)
? static_cast<int> (args(3).double_value ()) : 5;
if (nargin < 2)
{
usage ("foo (a, b [, case [, max_try]])");
return retval;
}
switch (n)
{
case 0:
{
// Use Matrix::operator + (const Matrix&, const Matrix&)
Matrix a = args(0).matrix_value ();
Matrix b = args(1).matrix_value ();
Matrix r;
for (int k = 0; k < max_try; k++)
r = a + b;
retval(0) = r;
}
break;
case 1:
{
// Use loops with Matrix::operator () (int, int) for indexing.
Matrix a = args(0).matrix_value ();
Matrix b = args(1).matrix_value ();
int n = a.rows ();
int m = a.cols ();
Matrix r (n, m);
for (int k = 0; k < max_try; k++)
for (int j = 0; j < m; j++)
for (int i = 0; i < n; i++)
r(i,j) = a(i,j) + b(i,j);
retval(0) = r;
}
break;
case 2:
{
// Use loops with Matrix::operator () (int, int) const for
// indexing elements on RHS of assignment operator and
// nonconst indexing operator on LHS.
const Matrix a = args(0).matrix_value ();
const Matrix b = args(1).matrix_value ();
int n = a.rows ();
int m = a.cols ();
Matrix r (n, m);
for (int k = 0; k < max_try; k++)
for (int j = 0; j < m; j++)
for (int i = 0; i < n; i++)
r(i,j) = a(i,j) + b(i,j);
retval(0) = r;
}
break;
case 3:
{
// Use loops with Matrix::operator () (int, int) const for
// indexing elements on RHS of assignment operator and
// builtin pointer for accessing elements on LHS.
const Matrix a = args(0).matrix_value ();
const Matrix b = args(1).matrix_value ();
int n = a.rows ();
int m = a.cols ();
Matrix r (n, m);
for (int k = 0; k < max_try; k++)
{
double *pr = r.fortran_vec ();
for (int j = 0; j < m; j++)
for (int i = 0; i < n; i++)
*pr++ = a(i,j) + b(i,j);
}
retval(0) = r;
}
break;
case 4:
{
// Use pointers all around, and one loop instead of two.
const Matrix a = args(0).matrix_value ();
const Matrix b = args(1).matrix_value ();
int n = a.rows ();
int m = a.cols ();
Matrix r (n, m);
int nm = n * m;
for (int k = 0; k < max_try; k++)
{
const double *pa = a.data ();
const double *pb = b.data ();
double *pr = r.fortran_vec ();
for (int i = 0; i < nm; i++)
*pr++ = *pa++ + *pb++;
}
retval(0) = r;
}
break;
default:
// Do nothing. Call this case to see whether overhead for
// the function call is significant.
break;
}
return retval;
}
Here are the results on my system (a 266 MHz Pentium II running Linux
and using egcs 1.1.2).
$ octave
GNU Octave, version 2.0.14 (i686pclinuxgnu).
Copyright (C) 1996, 1997, 1998, 1999 John W. Eaton.
This is free software with ABSOLUTELY NO WARRANTY.
For details, type `warranty'.
octave:1> foo
usage: foo (a, b [, case [, max_try]])
octave:1> a = rand (1000); b = rand(1000);
octave:2> for i = 1:4 t = cputime(); foo (a, b, i, 10); cputime()  t endfor
ans = 0 # do nothing
ans = 2.3700 # operator +
ans = 3.0800 # nonconst operator () on RHS and LHS
ans = 2.4500 # const operator () on RHS, nonconst on LHS
ans = 2.3000 # const operator () on RHS, pointer on LHS
ans = 2.2600 # pointers all around
So, if you use const where appropriate, then I think the performance
should be about the same whether you use the indexing operators or
pointers. Even without const (which causes the operators to have to
do a bit more work to decide whether the data needs to be copied
prior to returning a reference), I don't think you should see a speed
up of 3 times just by switching to using pointers.
Perhaps the problem is some missed optimization. What compiler and
options are you using?
Just to see what would happen, I tried using fnostrengthreduce (you
don't say how your copy of Octave was compiled, but I seem to recall
that Dirk Eddelbuettel compiles the Debian distributions with this
flag). Here are the results I see when using fnostrengthreduce to
compile the .oct file:
octave:1> foo
usage: foo (a, b [, case [, max_try]])
octave:1> a = rand (1000); b = rand(1000);
octave:2> for i = 1:4 t = cputime(); foo (a, b, i, 10); cputime()  t endfor
ans = 0 # do nothing
ans = 2.3300 # operator +
ans = 6.0800 # nonconst operator () on RHS and LHS
ans = 5.5200 # const operator () on RHS, nonconst on LHS
ans = 3.9400 # const operator () on RHS, pointer on LHS
ans = 2.3900 # pointers all around
Ouch. Strength reduction really helps here. Unless we resort to
writing lowlevel code that is very difficult to maintain (not my
first choice), we are going to have to have good optimizing compilers,
or at least use the optimzations that are available to us.
jwe

Octave is freely available under the terms of the GNU GPL. To ensure
that development continues, see www.che.wisc.edu/octave/giftform.html
Instructions for unsubscribing: www.che.wisc.edu/octave/archive.html
