[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Using OpenMP in Octave
Re: Using OpenMP in Octave
Fri, 02 Apr 2010 22:55:42 +0200
Mozilla-Thunderbird 220.127.116.11 (X11/20090706)
Jaroslav Hajek wrote:
Not a chance you've committed much more than me recently ... You clearly
have more time for fun stuff like Octave than me :'-(
I thought you would :) What if I won't like your constants, figuring
out better working ones for me? A commit war?
Mainly that I think it might be inefficient to have a set of tunable
parameters. Frankly the more I think about probably the best solution is
ti just say that the cases where the multithread code is slower than the
single thread code are going to be the cases where their speed isn't an
issue, and have a really stupid minimalist test and a flag to turn off
multithreading complete. Yes I think its a good idea to be able to turn
it off, and not just with a flag when Octave is launched as matlab
currently implements it
I need to be able to turn off the multithreading at runtime and I
absolutely insist on that, so there needs to be a branch anyway.
The solution I offered earlier was to encapsulate the decision into a
routine, that will accept loop dimensions (number of cycles and the
"size" of one cycle) and decide whether it's worth parallelizing.
An universal switch point may be a good start. But if you expect
people to play with the parallel code, I don't see why you don't want
a tunable parameter for every operation and every mapper, if it can be
<cut proposed exception handling OpenMP code
Ok, I didn't test the proposed code but threw it out for discussion..
There's also a few typos and uninitialized values as well. I'd thought
the setjmp/longjmp stuff might be needed for mapper functions wriitten
in fortran, but in fact I don't think so as the setjmp/longjmp
exceptions will be rethrown as standard C++ exceptions within each
OpenMP thread.. I'll work up something else instead.
Yes. Besides, I don't think this works.
I think you mixed together the interrupts thrown from Fortran that are
handled by setjmp/longjmp and regular C++ exceptions (which are the
case here). Here, one needs to employ try/catch. But see below, I
don't think this is necessary.
This question of the exception will occur even more as
we try to parallelize at a coarser grain and you probably have similar
issues in your parcellfun implementation already.
Not really. Remember that parcellfun does not use multithreading.
But what happens when one of your forked processed unexpectedly dies?
Are you sure that MKL and ACML both use OpenMP for multithreading? What
about others vendors? In any case wrapping all of this in a function to set
the number of threads in Octave should be easier. Note that Matlab
deprecated the maxNumCompThreads function
in recent versions of matlab exactly because libraries like BLAS and FFTW
are multithreaded. So we'd be going against the decision taken by matlab
because they consider they can't control the number of threads of these
Unlike Matlab, Octave is open, so users can use BLAS of their choice.
works fine or at least used to when I was last able to test it. You
might have to LD_PRELOAD the lapack as well if there are some unexpected
dependencies between the lapack and blas chosen by matlab. I used to do
things like this all the time when benchmarking octave and matlab to
ensure a fair comparison that was independent of the external dependencies.
Frankly I have no problems one way or the other about this, I was just
pointing out that matlab deprecated this functionality and so we'd be
making the opposite chose.. In general I like to keep Octave and matlab
compatible, but in this case as you say "so what".. Its inclusion in
Octave won't affect the compatibility with matlab as its probably not a
function that I'd think to use within a script or function and even if I
did wrapping in an "if exist(OCTAVE_VERSION)" isn't a big issue
No, they're not always multithreaded by OpenMP. For instance, GotoBLAS
is not. So what? It may be useful anyway, but we can simply say "this
function is equivalent to omp_set_num_threads, so it will only
influence libraries based on OpenMP".
No its a case of evangelizing Octave.. Do a search for "octave" on the
matlab newsgroup and see how many users there describe Octave as slow...
I've even worked with people in the past that wouldn't use Octave
because of their code was twice as slow in Octave, even though we only
had 10 matlab licenses for the whole group and I was running multiple
instances of Octave on many machines and getting results faster in any
case. With an attitude of "who cares", I'm sorry but mathworks has
already won and we only continue to develop Octave for ourselves..
I think stuff like this has a far bigger impact than parallelizing
plus and minus. Besides, Octave's got no JIT, so even if you
parallelize plus, a+b+c will still be slower than it could be if it
was done by a single serial loop.
Not that it really matters; as I already said these operations are
seldom a bottleneck, and if they are, rewriting to C++ is probably the
way to go.
Yes but as you said yourself its a manner of communications. If someone does
a bench of a+b, stupid as that maybe be, I'd like Octave to be just as fast
as matlab. Kai's results showed that is quad core AMD improved performance
of the binary operators with multithreading. JITs another issue that yes the
cases where it makes the most difference aren't the ones that have the
largest impact on well written code. Though there are a lot of idoits out
there writing bad code and benchmarking Octave against Matlab and then
badmouthing Octave. So yes JIT would be good to have as well :-)
Maybe you place too much on idiots. Chasing Matlab in functionality is
useless; we can never win. There will always be idiots like that. Who
That being said the person who has most contributed to the speed of
Octave recently is you... I agree that getting the code that "matters"
faster is more important than the gimmicks..
Hey, giving what I'm working on now in my real life, doing OpenMP code
is about the best thing I can do for Octave.
If its isolated and disabled by default I don't see why it shouldn't go into
3.4.0 to allow people to start playing with this code.. There seems to be
quite a few people regularly building and testing Octave now and letting
them have this functionality in an experimental form can only be a good
thinng, as long as of course it does put at risk the 3.4.0 release.
OK, I see you're determined :)
Yes but a "Ctrl-C" and out-of-memory can cause an exception anywhere, so
I'd expect we need to be careful in any case. If we declare certain
functions as not throwing exceptions the would prevent the use of
"Ctrl-C" in these functions. I suppose in most case the memory is
allocated before the loop and the "out-of-memory" exception can probably
be avoided. So the only effect of you solution is disabling ctrl-C in
OpenMP loops in the elementary functions.
So, let's merge the elemental operations patch first. Even though they
scale poorly, it appears you can still get them, say, 2x faster with 4
cores. If you have 4 free cores, why not... besides, certain cases
scale well. For instance, integer division is apparently
computationally intensive enough to scale as well as the mappers.
We do not need to do the ugly magic proposed above; the logical
operations on reals are actually the only loops that can throw, and I
never liked this solution anyway. I think this can be solved by other
means, by just checking for the NaNs prior to the operation. We shall
indicate in mx-inlines.cc that loops must not throw exceptions, and
declare them as such.
If we have a solution for the exceptions from mapper functions, if its
not too harsh on the performance why not use it for the elementary
functions as well? If its costly sure its better to avoid it.
Then we can address the mappers. This is going to be more difficult as
I don't think we can currently universally rely on the mappers to
never throw. Further, unlike the fast loops, the mappers can take a
long time and so are supposed to be breakable.
I think this is one of the biggest drawbacks of the OpenMP attempts.
Previously, making computations breakable required just putting
octave_quit() somewhere. Now, it should be done carefully and inside
parallel sections we simply need to resort to something more
Maybe there's a better, exception-safe method of C++ parallelism out
there? What about boost::threads? (I don't know, I'd have to check it
it seems that we can't use "#pragma omp parallel for" if we want the
exception to result in all the threads exiting immediately. The effect
might be than the exception is treated in the loop that threw it, but
the other maps have to finish before going any further. Though writing
the parallel for loop manually in OpenMP is possible and we could check
if an n exception had been thrown and exit all of the threads early.
This would change the structure of your code however and probably make
it slower... As these threads likely to exist longer enough to make the
Otherwise we need to do something else than OpenMP, like what is
I'd suggest going with the simple solution using "#pragma omp parallel
for" and try-catch first and then see if it causes any issues
- Re: Using OpenMP in Octave,
David Bateman <=