lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] overview of C++ expression template libraries


From: Greg Chicares
Subject: Re: [lmi] overview of C++ expression template libraries
Date: Mon, 08 Jan 2007 10:43:46 +0000
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

On 2005-8-31 12:33 UTC, Vadim Zeitlin wrote:
> 
> 1. PETE if I can make it work as easily as you say
> 2. uBLAS
> 3. boost::lambda
> 4. std::valarray
> 5. maybe something hand-coded if I can do it quickly (< day?)

I was looking at this over the holiday season. From the evidence
produced by 'expression_template_0_test.cpp' [1], I think we can
draw conclusions clear enough to move forward.

I would rule out boost::lambda due to the discussion here:
  http://lists.gnu.org/archive/html/lmi/2005-09/msg00000.html
i.e., not because of any shortcoming of that library, but rather
because of the limited arity [2] of std::transform and its kin.

We can also rule out boost::uBLAS because there are better
alternatives for our particular problem domain. Its performance
in the speed tests below is poorest of all candidates. It uses
operator functions in fewer cases than the other libraries:
e.g., for scalar sN and vector vN, these expressions
  v1 = s0 - v0;
  v2 += v0 * v1;
can't be written. And its performance is approximately halved
in this test unless NDEBUG is defined, but I consider that use
of the standard macro (instead of a library-specific one) to be
a drawback--some other libraries (like boost::spirit) provide
useful lightweight sanity checks that depend on the same macro,
so no single setting is suitable for all libraries.

I would choose PETE over std::valarray, at least as a first step.
It seems that Gaby has done an excellent job with the libstdc++
implementation, which runs about fifty percent faster than PETE.
PETE's main advantage is that it works with std::vector, which
lmi already uses. Thus, we can change and test a function (or
even a line) at a time, without modifying any member declaration.
If this works well, we can always convert to std::valarray (or
something else, like (5) above) later. If no one disagrees with
this strategy, I'll move forward, probably importing the PETE
sources (only about 80K) into lmi from freepooma first.

[1] Timings using MingW gcc-3.4.4 [redacted for clarity]:

  Speed tests: array length 10
  C               : [2.358e-008] 1000000 iterations took 23 milliseconds
  valarray        : [4.696e-008] 1000000 iterations took 46 milliseconds
  PETE            : [6.266e-008] 1000000 iterations took 62 milliseconds
  uBLAS           : [4.073e-007] 100000 iterations took 40 milliseconds

  Speed tests: array length 100
  C               : [1.581e-007] 1000000 iterations took 158 milliseconds
  valarray        : [2.242e-007] 100000 iterations took 22 milliseconds
  PETE            : [3.418e-007] 100000 iterations took 34 milliseconds
  uBLAS           : [7.095e-007] 100000 iterations took 70 milliseconds

  Speed tests: array length 1000
  C               : [1.700e-006] 100000 iterations took 169 milliseconds
  valarray        : [2.034e-006] 100000 iterations took 203 milliseconds
  PETE            : [3.389e-006] 100000 iterations took 338 milliseconds
  uBLAS           : [4.640e-006] 10000 iterations took 46 milliseconds

For lmi, array lengths are typically on the order of ten to one
hundred; results for one thousand are shown to illustrate
asymptotic performance.

[2] From 'expression_template_0_test.cpp':

/// An expression-template numeric-array class performs two jobs:
///   it agglutinates expressions, deferring their evaluation; and
///   it applies the agglutinated expression across all elements.
///
/// 'Lambda' libraries do the first job only: agglutination. Using
/// STL facilities like std::for_each or std::transform for the
/// other job, application, painfully restricts arity to two. This
/// model accommodates arbitrarily complicated operations (e.g., a
/// truncated Taylor series), but only for one or two operands: it
/// is not possible to add four vectors (v0 + v1 + v2 + v3).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]