[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #46830] Multiplication about 4x slower than Ma

From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #46830] Multiplication about 4x slower than Matlab
Date: Tue, 20 Sep 2022 02:39:59 -0400 (EDT)

Follow-up Comment #28, bug #46830 (project octave):

IIUC, hyperthreads on the same physical CPU core share low-level CPU caches
(at least some of them). Those CPU caches can be accessed much faster than
"regular" memory.
Optimized BLAS implementations gain some of their performance by efficiently
using those caches.

If multiple hyper-threads are running on the same CPU that can lead to more
cache misses. In that case, the data is accessed over the slower bus to the
"regular" memory. It might also be that the CPU caches are refreshed in that
case. That in turn leads to the other hyperthread being slowed down when it
tries to access the data that was in the CPU cache before the refresh. All of
this is likely happening multiple times. Hence the "rivaling" hyperthreads are
slowing down each other.

Afaiu, that is the main reason why using more threads than physical cores
doesn't lead to much performance improvement (or none at all). It can even be
slower than using only the physical cores for some operations (if fast access
to the cached data is more advantageous than pure "calculation power").


Reply to this item at:


Message sent via Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]