bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-apl] More performance results


From: Juergen Sauermann
Subject: [Bug-apl] More performance results
Date: Wed, 16 Apr 2014 19:56:07 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130330 Thunderbird/17.0.5

Hi,

I have created a benchmark program that measures the startup (fork) and finish (join)
times of OMP. It also compares them with a hand-crafted fork/join.

The manual implementation uses a O(log(P)) algorithm for forking and joining compared to apparently an assumed O(P) algorithm in OMP. It would therefore be very interesting if Elias could run it on his 80-core machine. For my dual-core the difference between both
types of algorithm should be minor.

The first run of both algorithms seemed to suggest hand-crafted version is much faster
than OMP:

Pass 0: 2 cores/threads, 15330 cycles total (hand-crafted)

Pass 0: 2 cores/threads, 99197 cycles total (OMP)


But then came a surprise when I ran the benchmark loop several times in a row:

./Parallel 2 (hand-crafted)
Pass 0: 2 cores/threads, 17542 cycles total
Pass 1: 2 cores/threads, 21070 cycles total
Pass 2: 2 cores/threads, 19075 cycles total
Pass 3: 2 cores/threads, 18249 cycles total
Pass 4: 2 cores/threads, 16415 cycles total

./Parallel_OMP 2 (OMP)
Pass 0: 2 cores/threads, 1213632 cycles total
Pass 1: 2 cores/threads, 5831 cycles total
Pass 2: 2 cores/threads, 2434215 cycles total
Pass 3: 2 cores/threads, 5705 cycles total
Pass 4: 2 cores/threads, 5215 cycles total

The details in the OMP case reveal that most of the time is spent on fork
(which is different from Elias' earlier results where join took the most time.
Look a little like code-loading (shared lib?) might be the issue for OMP.

/// Jürgen



Attachment: Parallel.cc
Description: Text Data

Attachment: Makefile
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]