[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Inlining calls to primitives
Re: Inlining calls to primitives
Tue, 05 Sep 2006 18:20:23 +0200
Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)
Neil Jerram <address@hidden> writes:
> Interesting piece of work.
> It seems to me, though, that there are 3 things going on here.
> 1. Memoization of global variable references that yield one of a
> particular subset of common procedures. (I call this part
> memoization because it seems similar to the memoization that we
> already do for syntax like let, begin, and, etc.)
> 2. Inlining of the code for these procedures within CEVAL.
> 3. Changing IM_SYMs to be dynamic instead of fixed constants, plus the
> macrology and GCC jump table stuff.
> Do you know what the relative contributions of these 3 changes are?
Thanks Neil for clarifying this. The measurements you propose are
indeed a good idea and the results are not exactly as I was expecting
(which confirms that I'm not very good at predicting performance ;-)).
BTW, imsyms are not assigned dynamically: they are assigned statically
by the `extract-imsyms.sh' script.
I made a series of measurements with Guile compiled with `-pg -O0'.
Then I tried different configurations switching on and off each of these
3 features. The first table below summarizes the execution time
improvement, looking at the execution time of `every' itself as well as
the execution time of the whole program.
jump table vs. switch | 0.8% -1.4% (worse!)
inlining in `CEVAL ()' vs. funcall | 11.0% 4.7%
The second table shows improvement compared to the non-memoizing + jump
table version (i.e., with `(eval-disable 'inline)':
memoization + jt + inline | 32.4% 22.1%
memoization + switch + inline | 31.9% 23.2%
memoization + jt + funcall | 24.0% 18.3%
(Beware: I only run each test case 3 times or so so these figures should
not be considered as an ultimate benchmark! I'm attaching the whole
results for the record.)
In short, the outcome of using a jump table is negligible in this
context (it's really a microoptimization compared to the two other
Function call overhead, however, _is_ important, though only the second
source of improvement. Repeatedly using function calls to execute a
handful of instructions is costly. Plus it probably increases cache
misses, things like that.
Now, if we generalized the memoization thing, as you suggested, so that
any procedure could be memoized (based on user annotations), then things
may be a bit different because we would be using indirect function calls
(i.e., like `SCM (*func) () = xxx; return (func (arg));') while in my
measurements I was using immediate function calls (as in `scm_car
(op)'). I should compare indirect and immediate function calls, but I
presume that there is a slight performance difference.
Finally, memoization does indeed play an important role. I suspect that
it's mostly because, for instance, argument count is only checked at
memoization time, and not when the "inlinable" is actually executed.
Plus the memoization code is pretty local (unlike when `CEVAL ()' has to
go through `evap0', then `evalp1', etc.).
I'm afraid this is kind of a dirty report, but I hope it sheds some
light on the issue.
Also, Rob mentioned on IRC that he was concerned about the global
switch. I believe this can be fixed using fluids or something like that
so that inlining can be enabled/disabled on a per-module basis (as we
did with `current-reader'). But that will be the topic of another
thread maybe. ;-)