[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Octave speed up (Was: execution speed in *oct files)

From: Van den Eynde Gert
Subject: Octave speed up (Was: execution speed in *oct files)
Date: Tue, 8 Jun 1999 10:09:09 +0200

Hi all,

> | There seems to be no open source optimized BLAS available,
> Not that I know of.  And since an optimized BLAS almost always means
> machine and compiler dependent code (perhaps written in assembly
> language), and because machines and compilers keep changing, looking
> for hand-coded BLAS is probably not a very good long-term solution.

There is a 'free of charge, but register' Pentium Pro optimized BLAS
available at
They claim it's beneficial to use it on a Pentium II. I tried it, but it was
not as good as the other two alternatives I mention below.

> There may be some hope for a good solution to this problem though, in
> the form of the ATLAS project:  
> ATLAS may eventually provide a generic way to optimize the blas for
> any machine and compiler.  I don't think it is quite ready yet, but
> perhaps we can eventually use it to get a little bit better
> performance on all the systems where Octave can run.

I've tested ATLAS and I was really amazed. I use a Pentium II machine,
running Linux. After 30 minutes of work, ATLAS produced an optimized level 3
BLAS (or part of it, but I was only really interested in DGEMM). So after
the library was generated, I kicked dgemm.f out of the BLAS direcory in
Octave and added libatlas.a to the libraries to be linked with. I did some
benchmarking (very rudimentary)... result is that I get a matrix-matrix
multiply that is 6 times faster than the standard F77 DGEMM implementation.
If you use Octave often, the 30 minutes of code generation are really worth
it !

Another 'optimal code generator' is PHiPAC:
This package only generates SGEMM and DGEMM. It's most important option is
the length of the search: short, default or long. The short search took a
day for my machine (remember that for both ATLAS and PHiPAC it is best to
give the machine to them alone, so it's off line for other users during the
code generation!). It generated code that was a bit slower than the ATLAS
code (speed up of 5, compared to 6). I also tried the 'long' option, but
after four days, it was still in its first phase and I couldn't keep the
machine any longer off line. The people of PHiPAC indicate that the long
search can take a week (or two)... However, they ask people who did these
long runs to send the library and their machine specifications to them, so
they can distribute as many 'precompiled' libraries as possible.

The difference between these code generators and Blitz++ and MTL (Matrix
Template Library : , but seems down for
the moment) is that the latter rely on the C++ compiler to optimize the
code. Their benchmarks show that they can achieve almost optimal
performance, but they need the KAI C++ compiler. With egcs, it's not that
good. Of course, this approach is much more machine independent than using
'code generators'.

I hope this information was useful...

Have a nice day,


| Gert Van den Eynde                      mailto:address@hidden |
| SCK-CEN             |
| Reactor Physics                                                   |
| Boeretang 200                                                     |
| B-2400 Mol               .oooO                                    |
| Belgium                  (   )   Oooo.                            |
|___________________________\ (____(   )____________________________|
                             \_)    ) /

Octave is freely available under the terms of the GNU GPL.  To ensure
that development continues, see
Instructions for unsubscribing:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]