Re: [Bug-apl] More performance results

On 17 April 2014 01:56, Juergen Sauermann <address@hidden> wrote:

Hi,

I have created a benchmark program that measures the startup (fork) and finish (join)
times of OMP. It also compares them with a hand-crafted fork/join.

The manual implementation uses a O(log(P)) algorithm for forking and joining compared to
apparently an assumed O(P) algorithm in OMP. It would therefore be very interesting if
Elias could run it on his 80-core machine. For my dual-core the difference between both
types of algorithm should be minor.

The first run of both algorithms seemed to suggest hand-crafted version is much faster
than OMP:

Pass 0: 2 cores/threads, 15330 cycles total (hand-crafted)

Pass 0: 2 cores/threads, 99197 cycles total (OMP)

But then came a surprise when I ran the benchmark loop several times in a row:

./Parallel 2 (hand-crafted)
Pass 0: 2 cores/threads, 17542 cycles total
Pass 1: 2 cores/threads, 21070 cycles total
Pass 2: 2 cores/threads, 19075 cycles total
Pass 3: 2 cores/threads, 18249 cycles total
Pass 4: 2 cores/threads, 16415 cycles total

./Parallel_OMP 2 (OMP)
Pass 0: 2 cores/threads, 1213632 cycles total
Pass 1: 2 cores/threads, 5831 cycles total
Pass 2: 2 cores/threads, 2434215 cycles total
Pass 3: 2 cores/threads, 5705 cycles total
Pass 4: 2 cores/threads, 5215 cycles total

The details in the OMP case reveal that most of the time is spent on fork
(which is different from Elias' earlier results where join took the most time.
Look a little like code-loading (shared lib?) might be the issue for OMP.

/// Jürgen

From:	Elias Mårtenson
Subject:	Re: [Bug-apl] More performance results
Date:	Thu, 17 Apr 2014 10:41:03 +0800