[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: apple's objc runtime on linux?
From: |
Alexander Malmberg |
Subject: |
Re: apple's objc runtime on linux? |
Date: |
Sat, 08 Nov 2003 02:45:26 +0100 |
Benhur Stein wrote:
[snip]
> I took a rapid look at the message sending code of both runtimes and
> I really doubt that
> apple's can be faster than gnu's.
I believe so too, but I'm not familiar with the runtime part of
next-style message sending.
> Some simple measurements I've made on gnu's runtime (time to do 1e9
> calls
> to an empty method/function):
>
> 19.5 normal method call
> 6.8 indirect pointer (IMP)
> 5.1 direct call to 2-arg function
> 4.2 direct call to void function
> 0.5 empty loop
Interesting. I've done a fair amount of benchmarking of this at
different times, and these numbers are fairly consistent with mine. On
what architecture is this? How did you test it?
My tests (on a PII 400, hand-written loops with code copied from
generated code to avoid gcc optimization effects) give these results
(cycles/call, including loop overhead):
direct call, zero args 8.12
direct call, one arg 9.20
direct call, two args 11.04
c++ virtual call, one arg (self) 10.52
c++ virtual call, two args (self+1) 12.08
optimized lookup 21.68
original lookup ~45 (didn't test this
properly)
I've experimented with gcc's message sending code generation before, and
while I'm sure speed could be improved, I think it's generally 'fast
enough' (at least with the optimized lookup :).
The 'optimized lookup' is a hand-optimized x86 assembly objc_msg_lookup
(and slightly tweaked runtime) I've been using for a long time now:
http://w1.423.telia.com/~u42308495/alex/objc_msg_lookup_opt_ix86.tar.gz
I once tested changing the code generation to use the next runtime style
of message sending (ie. lookup and send in one call). This turned out to
be slightly slower than the gnu runtime style, but results in smaller
code.
> Inlining part of objc_msg_lookup (something that could be done by
> the compiler),
> time goes to 9.3 (but it uses 25 instructions instead of 6).
The critical path (local message send when the tables have been
initialized) in my optimized objc_msg_lookup is 17 instructions. That
includes two loads from the stack that an inlined version wouldn't need,
and a nil test that would be a good candidate for cse.
I have tested with this inlined, but I don't think I recorded the
results of those runs. I'm not sure any inlining would be worth the code
expansion, though.
> Sure, if the method/function does something, this numbers become
> closer...
True. And when you really need the speed, IMP caching isn't hard to do.
:)
- Alexander Malmberg