I've tried to measure the differences by using a simple C program which
simulates both behaviors. I'm no compiler writer and I didn't
disassemble the compiled code, so I'm not sure that this really proves
anything, but in case somebody is interrested, here are my results:
- Both test cases iterated over numbers from 2 to 100000 trying to find
primes using a stupid, simple algorithm.
- Both test cases hold no temporary variables on the stack, but instead
use a malloc'd structure for that.
- Test A uses direct access to the struct's fields.
- Test B uses indirect access by adding to the struct's pointer a static
global variable, which holds the offset to the fields.
- Both tests were tried with no compiler optimization (-O0) and with
maximum compiler optimization (-O3 -fomit-frame-pointer).
The results are:
Machine 1: Intel Celeron M @ 1.5GHz, x86-32, GCC 4.1.3 20070929 (prerelease)
Test A (unoptimized): 5466612us
Test B (unoptimized): 11096380us
Slowdown: 102.9800%
Test A (optimized): 5280693us
Test B (optimized): 5704486us
Slowdown: 8.0200%
Machine 2: Intel Pentium 4 @ 1.7GHz, x86-32, GCC 4.1.2 20061115 (prerelease)
Test A (unoptimized): 11228890us
Test B (unoptimized): 17972084us
Slowdown: 60.0500%
Test A (optimized): 15903029us
Test B (optimized): 16032732us
Slowdown: .8100%
Machine 3: Intel Pentium Dual-Core E2180 @ 2.0GHz, x86-64, GCC 4.2.3
Test A (unoptimized): 3646244us
Test B (unoptimized): 5469573us
Slowdown: 50.0000%
Test A (optimized): 3680493us
Test B (optimized): 3630129us
Slowdown: -1.3700%
Machine 4: Intel Core 2 Duo E6600 @ 2.4GHz, x86-64, GCC 4.2.3
Test A (unoptimized): 3066969us
Test B (unoptimized): 4653554us
Slowdown: 51.7300%
Test A (optimized): 3132884us
Test B (optimized): 3070829us
Slowdown: -1.9900%
I've attached a tarball holding the program and test script. Hope it's
of any use.
--
Saso
Richard Frith-Macdonald wrote:
On 31 May 2008, at 16:21, David Chisnall wrote:
The advantages of this would be:
- No code using GNUstep or other frameworks compiled with clang/LLVM
(which we are almost in a position to do) would break if it inherited
from a class whose layout changed.
- No ABI breakage would be needed - code compiled with GCC would
still work on the modified runtime, although the existing constraints
on modification would still apply.
The disadvantages are:
- Currently ivar accesses on most platforms will be a single load /
store instruction in an indirect addressing mode with a constant
offset embedded in the instruction. This would add another load and
addition to every ivar access.
- The extra work that the runtime would do would increase load times
slightly.
So, my questions is, is this worth doing?
IMO ... yes. It's a good feature to have, and the overheads get more
insignificant as processor seeds increase.
_______________________________________________
Discuss-gnustep mailing list
Discuss-gnustep@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnustep
------------------------------------------------------------------------
_______________________________________________
Discuss-gnustep mailing list
Discuss-gnustep@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnustep