[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Parallel access to data created with Octave

From: Jarno Rajahalme
Subject: Parallel access to data created with Octave
Date: Fri, 30 Apr 2010 20:25:22 -0700

I have for quite some time now wanted to parallelize some OCT-file C++ code 
using OpenMP. My code uses some global Octave data (Cells of Cells of vectors 
or scalars). I had naively thought that even though Octave is not thread safe, 
"just reading" the global variables would be thread-safe :-)

It took some debugging, but I found out that, in order to avoid any dynamic 
memory allocations, while traversing the data, I must:

- use data(), get_rep(), and mex_get_data() to get pointers to the data 
(otherwise temporaries are dynamically allocated and freed, which messes up bad)
- In cells, I had to index manually using .data()[idx].get_rep() to avoid temp 
- NOT use dims(), columns(), rows(), length(), or numel() on scalars. Safe on 
matrices, but NOT on scalars. Have to inspect the data with is_scalar_type(), 
and if yes, length is 1, otherwise use numel() to read the vector length
- results have to be gathered via a pointer, parallel assignments to e.g. 
RowVector, even if the space is preallocated, seem to fail.

The data I used was all uint16 matrix and uint16 scalars.

The numel on a scalar type was a had one to track down. There is a static dv 
initialization within the scalar dims() function, which was a real pain. At 
first I added dummy length(), numel() and capacity() functions (all returning 
1), but as I wanted my code to be portable, I deleted these changes in the 
ov-base-scalar.h, and used the is_scalar_type() instead.

For example, this crashes when parallel: (pCellStore is pointer to a global 
Cell, arg1 is an octave index)

  const Cell & su = pCellStore->xelem(arg1-1).cell_value();
  const octave_idx_type smax = su.columns();

This is a bit faster, but still crashes, when parallel:

  const Cell su(pCellStore->xelem(arg1-1).cell_value());
  const octave_idx_type smax = su.columns();

But this does not crash:

  const octave_cell * sup = static_cast<const octave_cell 
  const octave_value * suovp = static_cast<const octave_value 
  const octave_idx_type smax = sup->columns();
  const octave_idx_type srows = sup->rows();

And, this is simple, but slow and crashes when parallel: (si is a loop variable)

  const uint16NDArray & sn = su.xelem(0,si).uint16_array_value();

This is faster, but still crashes, when parallel:

  const octave_value snv(su.xelem(0,si));
  const uint16NDArray sna(snv.uint16_array_value());
  const octave_idx_type snl = sna.nelem();
  const octave_uint16 * snp = sna.fortran_vec();

But this does not crash: 

  const octave_base_value * snvp(suovp[srows*si].internal_rep()); // either 
octave_uint16_matrix or octave_uint16_scalar
  const octave_idx_type snl(snvp->is_scalar_type() ? 1 : snvp->numel());
  const octave_uint16 * snp(static_cast<const octave_uint16 
*>(snvp->mex_get_data())); // works for both

It gets a bit messy, but it is a lot faster, and works also with Octave 3.2.3, 
when compiling the oct-file with OpenMP :-)

Overall, with these changes the code became about 1.8x as fast (1 core), and 
3.4x as fast (2 cores), than my original code, where I followed the examples in 
the Octave manual as I then could.  So there is a lot of overhead in "just 
accessing" data in OCT files, if one is not careful! 

My original C++ itself was more than 145x faster than the equivalent, optimized 
.m version. Now the speed difference is whopping 500x. This must be about the 
worst case of comparison for Octave, as there are no matrix operations, and 
there are 6 levels of for loops involved... 



P.S. Finally, I just remembered that there is free OpenMP also for MSVC, it 
comes with one of the SDKs... I used it last year.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]