[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Using OpenMP in Octave
Re: Using OpenMP in Octave
Sat, 3 Apr 2010 21:27:01 +0200
On Fri, Apr 2, 2010 at 10:55 PM, David Bateman <address@hidden> wrote:
> Jaroslav Hajek wrote:
>> I thought you would :) What if I won't like your constants, figuring
>> out better working ones for me? A commit war?
> Not a chance you've committed much more than me recently ... You clearly
> have more time for fun stuff like Octave than me :'-(
I hope this will last for a while. Clouds are gathering over my
contributing times, though.
>> I need to be able to turn off the multithreading at runtime and I
>> absolutely insist on that, so there needs to be a branch anyway.
>> The solution I offered earlier was to encapsulate the decision into a
>> routine, that will accept loop dimensions (number of cycles and the
>> "size" of one cycle) and decide whether it's worth parallelizing.
>> An universal switch point may be a good start. But if you expect
>> people to play with the parallel code, I don't see why you don't want
>> a tunable parameter for every operation and every mapper, if it can be
>> done efficiently.
> Mainly that I think it might be inefficient to have a set of tunable
Oh, and I thought that *I* am the one always obsessed with performance
around here :)
No, of course, I wouldn't propose something that hurts performance.
> Frankly the more I think about probably the best solution is ti
> just say that the cases where the multithread code is slower than the single
> thread code are going to be the cases where their speed isn't an issue, and
> have a really stupid minimalist test and a flag to turn off multithreading
> complete. Yes I think its a good idea to be able to turn it off, and not
> just with a flag when Octave is launched as matlab currently implements it
I thought that a tunable minimum size for parallelization was a good start.
> <cut proposed exception handling OpenMP code
>> Yes. Besides, I don't think this works.
>> I think you mixed together the interrupts thrown from Fortran that are
>> handled by setjmp/longjmp and regular C++ exceptions (which are the
>> case here). Here, one needs to employ try/catch. But see below, I
>> don't think this is necessary.
> Ok, I didn't test the proposed code but threw it out for discussion..
> There's also a few typos and uninitialized values as well. I'd thought the
> setjmp/longjmp stuff might be needed for mapper functions wriitten in
> fortran, but in fact I don't think so as the setjmp/longjmp exceptions will
> be rethrown as standard C++ exceptions within each OpenMP thread.. I'll work
> up something else instead.
No need regarding the elemental operations, I already removed the
infringing exception. So we can apply the previous patch.
>>> This question of the exception will occur even more as
>>> we try to parallelize at a coarser grain and you probably have similar
>>> issues in your parcellfun implementation already.
>> Not really. Remember that parcellfun does not use multithreading.
> But what happens when one of your forked processed unexpectedly dies?
You mean a crash, such that even unwind_protect_cleanups won't
execute? In that case, I think it will hang. There's currently no
mechanism to distinguish it from the case where a job simply takes
forever. Of course, proposals for improvements are welcome, but I
don't really think smart actions are needed for crashes.
>>> Are you sure that MKL and ACML both use OpenMP for multithreading? What
>>> about others vendors? In any case wrapping all of this in a function to
>>> the number of threads in Octave should be easier. Note that Matlab
>>> deprecated the maxNumCompThreads function
>>> in recent versions of matlab exactly because libraries like BLAS and FFTW
>>> are multithreaded. So we'd be going against the decision taken by matlab
>>> because they consider they can't control the number of threads of these
>> Unlike Matlab, Octave is open, so users can use BLAS of their choice.
> LD_PRELOAD=/usr/lib/myfavblas.so matlab
> works fine or at least used to when I was last able to test it.
Interesting. I wonder if it's legal?
> You might
> have to LD_PRELOAD the lapack as well if there are some unexpected
> dependencies between the lapack and blas chosen by matlab. I used to do
> things like this all the time when benchmarking octave and matlab to ensure
> a fair comparison that was independent of the external dependencies.
>> No, they're not always multithreaded by OpenMP. For instance, GotoBLAS
>> is not. So what? It may be useful anyway, but we can simply say "this
>> function is equivalent to omp_set_num_threads, so it will only
>> influence libraries based on OpenMP".
> Frankly I have no problems one way or the other about this, I was just
> pointing out that matlab deprecated this functionality and so we'd be making
> the opposite chose.. In general I like to keep Octave and matlab compatible,
> but in this case as you say "so what".. Its inclusion in Octave won't affect
> the compatibility with matlab as its probably not a function that I'd think
> to use within a script or function and even if I did wrapping in an "if
> exist(OCTAVE_VERSION)" isn't a big issue
>>>> I think stuff like this has a far bigger impact than parallelizing
>>>> plus and minus. Besides, Octave's got no JIT, so even if you
>>>> parallelize plus, a+b+c will still be slower than it could be if it
>>>> was done by a single serial loop.
>>>> Not that it really matters; as I already said these operations are
>>>> seldom a bottleneck, and if they are, rewriting to C++ is probably the
>>>> way to go.
>>> Yes but as you said yourself its a manner of communications. If someone
>>> a bench of a+b, stupid as that maybe be, I'd like Octave to be just as
>>> as matlab. Kai's results showed that is quad core AMD improved
>>> of the binary operators with multithreading. JITs another issue that yes
>>> cases where it makes the most difference aren't the ones that have the
>>> largest impact on well written code. Though there are a lot of idoits out
>>> there writing bad code and benchmarking Octave against Matlab and then
>>> badmouthing Octave. So yes JIT would be good to have as well :-)
>> Maybe you place too much on idiots. Chasing Matlab in functionality is
>> useless; we can never win. There will always be idiots like that. Who
> No its a case of evangelizing Octave.. Do a search for "octave" on the
> matlab newsgroup and see how many users there describe Octave as slow...
Well, it's true Octave is slower in many aspects, especially anywhere
JIT has a word.
Some stuff is faster though.
> I've even worked with people in the past that wouldn't use Octave because of
> their code was twice as slow in Octave, even though we only had 10 matlab
> licenses for the whole group and I was running multiple instances of Octave
> on many machines and getting results faster in any case.
An excellent demonstration of the advantages of free software, I'd say.
If your colleagues didn't understand, they probably didn't want to.
> With an attitude of
> "who cares", I'm sorry but mathworks has already won and we only continue to
> develop Octave for ourselves..
Isn't that what we do anyway? Quoting (imprecisely) JWE, Octave is not
a competing product to Matlab, it is a community project.
Developed by community, for the community.
> That being said the person who has most contributed to the speed of Octave
> recently is you... I agree that getting the code that "matters" faster is
> more important than the gimmicks..
Definitely. I am also very confident that high-level parallelism is
much more important. At least for my work, multithreaded mappers are
probably of little use, even multithreaded BLAS doesn't help me much,
but parcellfun has helped me incredibly. It's great that Octave
provides functions like fork() and pipe(). I even wondered whether
doing fork() inside Matlab could be technically illegal, because it
probably bypasses the license manager mechanism of limiting number of
I'm surely glad I don't need to worry about nonsenses like this with Octave.
>>> If its isolated and disabled by default I don't see why it shouldn't go
>>> 3.4.0 to allow people to start playing with this code.. There seems to be
>>> quite a few people regularly building and testing Octave now and letting
>>> them have this functionality in an experimental form can only be a good
>>> thinng, as long as of course it does put at risk the 3.4.0 release.
>> OK, I see you're determined :)
> Hey, giving what I'm working on now in my real life, doing OpenMP code is
> about the best thing I can do for Octave.
>> So, let's merge the elemental operations patch first. Even though they
>> scale poorly, it appears you can still get them, say, 2x faster with 4
>> cores. If you have 4 free cores, why not... besides, certain cases
>> scale well. For instance, integer division is apparently
>> computationally intensive enough to scale as well as the mappers.
>> We do not need to do the ugly magic proposed above; the logical
>> operations on reals are actually the only loops that can throw, and I
>> never liked this solution anyway. I think this can be solved by other
>> means, by just checking for the NaNs prior to the operation. We shall
>> indicate in mx-inlines.cc that loops must not throw exceptions, and
>> declare them as such.
> Yes but a "Ctrl-C" and out-of-memory can cause an exception anywhere,
No, not anywhere. The mx-inline loops are not breakable. That would
slow them down and they generally are so fast that they complete in a
fraction of second even when applied to arrays that almost fill the
> so I'd
> expect we need to be careful in any case. If we declare certain functions as
> not throwing exceptions the would prevent the use of "Ctrl-C" in these
> functions. I suppose in most case the memory is allocated before the loop
> and the "out-of-memory" exception can probably be avoided. So the only
> effect of you solution is disabling ctrl-C in OpenMP loops in the elementary
>> Then we can address the mappers. This is going to be more difficult as
>> I don't think we can currently universally rely on the mappers to
>> never throw. Further, unlike the fast loops, the mappers can take a
>> long time and so are supposed to be breakable.
> If we have a solution for the exceptions from mapper functions, if its not
> too harsh on the performance why not use it for the elementary functions as
> well? If its costly sure its better to avoid it.
It's done differently precisely because of performance.
The mx-inline loops are supposed to be fast. They are used for certain
mappers as well, but only those that are fast and are normally done by
a few inline instructions, such as abs(), real(), imag(), conj() etc.
>> I think this is one of the biggest drawbacks of the OpenMP attempts.
>> Previously, making computations breakable required just putting
>> octave_quit() somewhere. Now, it should be done carefully and inside
>> parallel sections we simply need to resort to something more
>> Maybe there's a better, exception-safe method of C++ parallelism out
>> there? What about boost::threads? (I don't know, I'd have to check it
> it seems that we can't use "#pragma omp parallel for" if we want the
> exception to result in all the threads exiting immediately.
Yes, IIRC the omp parallel loop must not be terminated by break.
> The effect might
> be than the exception is treated in the loop that threw it, but the other
> maps have to finish before going any further. Though writing the parallel
> for loop manually in OpenMP is possible and we could check if an n exception
> had been thrown and exit all of the threads early. This would change the
> structure of your code however and probably make it slower... As these
> threads likely to exist longer enough to make the overhead worthwhile?
> Otherwise we need to do something else than OpenMP, like what is described
> I'd suggest going with the simple solution using "#pragma omp parallel for"
> and try-catch first and then see if it causes any issues
Since most mappers do not throw exceptions, maybe it would be better
if we handled the possible Fortran exceptions directly in the mappers
that can throw them and simply return NaNs if they occur.
We would then only need to deal with the interrupt exception, which I
think would be simple enough, given the way they are signaled.
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic