Re: [Bug-apl] More performance results

From:

Juergen Sauermann

Subject:

Date:

Thu, 17 Apr 2014 14:39:20 +0200

User-agent:

Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130330 Thunderbird/17.0.5

Hi Elias,

just do something like this:

for ((i=1; $i<80; ++i)); do
./Parallel $i
./Parallel_OMP $i
done

That wil create a number of files ending in .omp and .man.

In the previous version there was a fault, corrected in the
attached version.

/// Jürgen

On 04/17/2014 04:41 AM, Elias Mårtenson wrote:

I'll be happy to run the benchmark. Can you give me the details on how to actually run it?

Regards,

Elias

On 17 April 2014 01:56, Juergen Sauermann <address@hidden> wrote:

Hi,

I have created a benchmark program that measures the startup (fork) and finish (join)
times of OMP. It also compares them with a hand-crafted fork/join.

The manual implementation uses a O(log(P)) algorithm for forking and joining compared to
apparently an assumed O(P) algorithm in OMP. It would therefore be very interesting if
Elias could run it on his 80-core machine. For my dual-core the difference between both
types of algorithm should be minor.

The first run of both algorithms seemed to suggest hand-crafted version is much faster
than OMP:

Pass 0: 2 cores/threads, 15330 cycles total (hand-crafted)

Pass 0: 2 cores/threads, 99197 cycles total (OMP)

But then came a surprise when I ran the benchmark loop several times in a row:

./Parallel 2 (hand-crafted)
Pass 0: 2 cores/threads, 17542 cycles total
Pass 1: 2 cores/threads, 21070 cycles total
Pass 2: 2 cores/threads, 19075 cycles total
Pass 3: 2 cores/threads, 18249 cycles total
Pass 4: 2 cores/threads, 16415 cycles total

./Parallel_OMP 2 (OMP)
Pass 0: 2 cores/threads, 1213632 cycles total
Pass 1: 2 cores/threads, 5831 cycles total
Pass 2: 2 cores/threads, 2434215 cycles total
Pass 3: 2 cores/threads, 5705 cycles total
Pass 4: 2 cores/threads, 5215 cycles total

The details in the OMP case reveal that most of the time is spent on fork
(which is different from Elias' earlier results where join took the most time.
Look a little like code-loading (shared lib?) might be the issue for OMP.

/// Jürgen

Parallel.cc
Description: Text Data

Makefile
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-apl] Reading CPU cycle counter, Juergen Sauermann, 2014/04/02

[Bug-apl] More performance results, Juergen Sauermann, 2014/04/16
- Re: [Bug-apl] More performance results, Elias Mårtenson, 2014/04/16
  - Re: [Bug-apl] More performance results, Juergen Sauermann <=