[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Octave 3.6.0 on Windows XP plot fails.

From: Michael Goffioul
Subject: Re: Octave 3.6.0 on Windows XP plot fails.
Date: Wed, 29 Feb 2012 15:45:19 +0000

On Wed, Feb 29, 2012 at 3:24 PM, Martin Helm <address@hidden> wrote:
> Am Mittwoch, 29. Februar 2012, 16:17:36 schrieb Michael Goffioul:
>> On Wed, Feb 29, 2012 at 3:09 PM, Martin Helm <address@hidden> wrote:
>> > Performing now my previous trivial example on my atom netbook (dual core
>> > atom with hyperthreading, htop shows 4 threads are used) ATLAS gives me
>> > a speedup by a factor 2.4 (so more than double!) when using multi vs
>> > single threaded atlas.
>> You're actually confirming my explanation. Indeed 2.4 is better than
>> 2, but way lower than 4. In various cases, HT will improve
>> performances, because some threads are actually stuck waiting for
>> data. But that's not always the case.
>> Michael.
> I do not claim that a hyperthreaded bunch of threads scales as well as a bunch
> od threads running on different physical cpu's (if you read that that was not
> what i want to say).
> Your comment just sounded to me a bit like you never get more than the
> physical core number as speedup which simply contradicts my experience, sorry
> if I misinterpreted what you wrote.

No, that's not what I meant, otherwise HT would be useless (as in: one
core with HT would never give higher perf than one core; if that was
true, Intel would never have implemented it...). My point was to give
a bit of a background about HT technology, and explain why in some
cases, there's no gain and there can even be a penalty. I was merely
answering your question as why matrix multiplication cannot benefit
from HT (you suggested OS, compiler, BLAS limitation; while the
limitation is HT itself).

I know, because I've gone through that road myself, being disappointed
to see my multi-threaded ATLAS given lower performances than a
single-threaded one. I then looked a bit more about what HT really is
and realized that it's not the holy grail. Just because a task manager
reports 2 CPUs does not mean you actually *have* 2 CPUs.

> And of course you will also never get a speedup by a factor n when using n
> physical cores as well since overhead is involved.

Does you simple example refers to the "rand" statement? If not, could
you try with a matrix multiplication like the following:

n=2000; A=randn(n); B=randn(n);tic; C=A*B; t=toc, MFLOPS=2*n^3/t*1e-6


reply via email to

[Prev in Thread] Current Thread [Next in Thread]