[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: Non-fragile ivars

From: Fred Kiefer
Subject: Re: RFC: Non-fragile ivars
Date: Tue, 03 Jun 2008 10:11:42 +0200
User-agent: Thunderbird (X11/20080226)

I like this whole idea a lot, but we will need a bit more performance testing to make sure we understand the consequences.

What I did not quite get is where are we going to store the size of the different classes? Perhaps a more detailed outline of the algorithm would help here.


Saso Kiselkov wrote:
I've tried to measure the differences by using a simple C program which
simulates both behaviors. I'm no compiler writer and I didn't
disassemble the compiled code, so I'm not sure that this really proves
anything, but in case somebody is interrested, here are my results:

- Both test cases iterated over numbers from 2 to 100000 trying to find
primes using a stupid, simple algorithm.

- Both test cases hold no temporary variables on the stack, but instead
use a malloc'd structure for that.

- Test A uses direct access to the struct's fields.

- Test B uses indirect access by adding to the struct's pointer a static
global variable, which holds the offset to the fields.

- Both tests were tried with no compiler optimization (-O0) and with
maximum compiler optimization (-O3 -fomit-frame-pointer).

The results are:

Machine 1: Intel Celeron M @ 1.5GHz, x86-32, GCC 4.1.3 20070929 (prerelease)
Test A (unoptimized): 5466612us
Test B (unoptimized): 11096380us
Slowdown: 102.9800%
Test A (optimized): 5280693us
Test B (optimized): 5704486us
Slowdown:  8.0200%

Machine 2: Intel Pentium 4 @ 1.7GHz, x86-32, GCC 4.1.2 20061115 (prerelease)
Test A (unoptimized): 11228890us
Test B (unoptimized): 17972084us
Slowdown: 60.0500%
Test A (optimized): 15903029us
Test B (optimized): 16032732us
Slowdown:  .8100%

Machine 3: Intel Pentium Dual-Core E2180 @ 2.0GHz, x86-64, GCC 4.2.3
Test A (unoptimized): 3646244us
Test B (unoptimized): 5469573us
Slowdown: 50.0000%
Test A (optimized): 3680493us
Test B (optimized): 3630129us
Slowdown:  -1.3700%

Machine 4: Intel Core 2 Duo E6600 @ 2.4GHz, x86-64, GCC 4.2.3
Test A (unoptimized): 3066969us
Test B (unoptimized): 4653554us
Slowdown: 51.7300%
Test A (optimized): 3132884us
Test B (optimized): 3070829us
Slowdown:  -1.9900%

I've attached a tarball holding the program and test script. Hope it's
of any use.


Richard Frith-Macdonald wrote:
On 31 May 2008, at 16:21, David Chisnall wrote:
The advantages of this would be:

- No code using GNUstep or other frameworks compiled with clang/LLVM
(which we are almost in a position to do) would break if it inherited
from a class whose layout changed.

- No ABI breakage would be needed - code compiled with GCC would
still work on the modified runtime, although the existing constraints
on modification would still apply.

The disadvantages are:

- Currently ivar accesses on most platforms will be a single load /
store instruction in an indirect addressing mode with a constant
offset embedded in the instruction.  This would add another load and
addition to every ivar access.

- The extra work that the runtime would do would increase load times

So, my questions is, is this worth doing?
IMO ... yes.  It's a good feature to have, and the overheads get more
insignificant as processor seeds increase.

Discuss-gnustep mailing list


Discuss-gnustep mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]