[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gcl-devel] Re: sequences / in-lining
Re: [Gcl-devel] Re: sequences / in-lining
06 Oct 2005 11:17:15 -0400
Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
"Paul F. Dietz" <address@hidden> writes:
> Camm Maguire wrote:
> > Also inlined are: length and reverse, endp and identity are expanded,
> > the former to throw an error only when safety >=1.
> > There is an unused heap sort in the code too, but I can't see any gain
> > here, as the memory requirements are virtually the same. The
> > quick-sort is non-recursive and uses a tiny stack ~ log_2(n). I have
> > to read up on what stable-sort is supposed to be.
> > More to follow. Feedback appreciated. Notably, sort is likely the
> > biggest inline we would ever consider, and its impact on code size
> > might be arguable. But it certainly pays off in performance.
> Build time for gcl on my machine seems to have gone up by roughly
> 50%. The size of the .o file for the random tester appears
> to have doubled. Is the inlining being a bit too aggressive?
This is a good discussion to have.
I believe the build time is not so affected by the degree of inlining
as it is by our strategy of autodeclaring let bindings, aka type
inferencing. I mean to get back to you soon on your earlier most
helpful reply to my request for a good algorithm. There is likely a
better way to do this in the medium term.
Initially, we recursively macroexpanded each let, looked for setq
forms, and then declared bindings determined to be constant to the
type of the initializing form. We had to be very careful to step over
macrolets, flets, 'the forms, .... essentially reproducing all the logic
that was already embedded in pass1 of the compiler. So a little while
ago, I replaced this strategy with a separate trial pass of pass1 and
simply reading the changed slot of the temporary variable pushed to
the stack. This has the advantage of allowing us to capture setq
forms where the set type is consistent with the initial type --
floating point, list pushing, adjust-array, etc. I have yet to commit
this latest feature, but I am hoping it will result in the elimination
of many inlined-sequence list/vector branches.
So now the let processing has a bit more overhead -- indeed it is
theoretically exponential on nested lets. To be practical, I had to
limit the nesting depth at which this inferencing takes place with the
special variable compiler::*dlbsrl* (to be renamed). It defaults to
2. Negative values disable all type inferencing, 0 autodeclares
constant bindings only, and greater than one allows picking up binding
changes that are made so many levels inside the one under
consideration. It was my intention to alter this with the compiler
speed/debug/et.al. proclamation/declaration settings.
The overhead of subtypep is now also substantially greater, but this
has been somewhat ameliorated with the type-and and type-or hashing
reset on each compile-file. The new implementation does appear to be
better (i.e. correct) and should allow us to clean things up
Finally, I'd like to note that the large bulk of the build time (for
me at least) is compiling the gcl lisp core using the interpreted
image. There are quite a few ways we could accelerate this -- e.g. do
like pcl and compile the critical code first, load, then proceed to
compiling the rest faster. Mention has been made to me in the past of
the importance of byte-code translation/interpretation/evaluation for
running interpreted code -- I had always taken the position that all
the user really cares about is how fast his/her own compiled stuff
runs and that therefore the interpreter speed was not something to
worry about, but this may indicate otherwise.
As to the other point, generated code size, I'm cc'ing this to a few
other users to solicit their comment on the right balance.
Heretofore, I have never received complaints on code size, but rather
the opposite -- GCL, including its own heap is such a trivially small
piece of what the end user is trying to do, and memory is so
ridiculously abundant, that the concern almost always centers on how
to get GCL to address more memory, not to make itself smaller, and how
GCL can deliver performance that competes with that delivered by other
languages to forestall pressures to convert projects to more widely
CVS head is definitely in a transitional state. We may not set
inlining on by default everywhere that we can in the final release --
it is still nice IMHO to have the ability so as to be able to
experiment and see where the important optimizations are. GCL has
traditionally always inlined member and mapcar for example. This has
been critical as far as I can see for a number of real world users,
not so much for the initial function call, but for the killer issue of
passing functional arguments, the vast majority of which can be
inlined to a very few instructions. Now one can argue, and accurately
IMHO, that GCL could significantly improve the performance of passed
functional arguments. The overhead of consing closure environments,
no fast-linking, several nested levels of funcall, several different
types of functions (vfun, sfun, cfun, etc.) that have to be separately
handled, etc. might all be addressed at some point. Perhaps the
priority lies more in this direction.
To me, however, the real issue is typical lisp usage -- the language
cries out for simple terse syntax, elegance, and a more or less
functional style, but then one discovers that performance is miserable
and writes horrible nested (the fixnum ...) statements, memq or :test
'eq instead of a simple member, declarations, setq's, gotos, explicit
loops, etc. Typical users often bypass most if not all of the
sequence functions, which appear to be provided as useful, readable
and elegant loop abstractions, for performance reasons. It would be
nice to provide an option, perhaps triggered by some compiler setting,
allowing one to use these functions with full performance.
Just my thoughts, Feeback most welcome.
Camm Maguire address@hidden
"The earth is but one country, and mankind its citizens." -- Baha'u'llah