On 20-Oct-2005, David Bateman wrote:
| The case causing the problem, although common is a difficult one to
| handle with the current scheme, as do_cat is called on the octave_value
| itself, so I can count 5 levels of function calls for each call to the
| do_cat operator, which is ok when the insert itself takes a significant
| time as for larger matrices being concatenated. However inserting one
| scalar after another 180x180 times this probably becomes
| catastrophically slow.
I don't think the performance problem is due to the small number of
function calls required for each trip through the loops in the
tree_matrix::rvalue function in pt-mat.cc. Instead, I think it is the
following code from ops.h:
#define DEFCATOP_FN(name, t1, t2, f) \
CATOPDECL (name, a1, a2) \
{ \
CAST_BINOP_ARGS (octave_ ## t1&, const octave_ ## t2&); \
return octave_value (v1.t1 ## _value (). f (v2.t2 ## _value (), ra_idx)); \
}
An expanded version of the last line of this macro would be something
like
return octave_value (v1.array_value().concat (v2.array_value (), ra_idx));
The key part is that array_value returns a copy. The copy is made by
adjusting a reference count, but it is still a copy, so when you call
the concat function on that and it calls the insert method, then an
actual copy is forced on you. Doing that about 180^2 times could
cause trouble.
We need to rethink the way this is implemented to avoid the need to
call array_value here. Until then, the optimization that I
implemented should handle the most common cases.
Note that the
complex case allows a mixture of both real and complex values. When
deciding if an object is "all complex" the check is
X.is_real_type () || X.is_complex_type ()
and the assumption is that if an object is real or complex, there will
be a complex_array_value extractor available. If not, then you will get an
error when the complex_array_value extractor is called (you will end
up in octave_base_value::complex_array_value, which throws an error).