[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: a plan for native compilation
Re: a plan for native compilation
Wed, 21 Apr 2010 13:02:37 -0400
On Apr 18, 2010, at 07:41, Andy Wingo wrote:
> Specifically, we should make it so that there is nothing you would want
> to go to a core file for. Compiling Scheme code to native code should
> never produce code that segfaults at runtime. All errors would still be
> handled by the catch/throw mechanism.
Including a segfault in compiled Scheme code, caused by an application-supplied
C procedure returning something that looks like one of the pointer-using SCM
objects but is in reality just garbage? There *will* be core files.
>> * Debug info in native representations, handled by GDB and other
>> debuggers. Okay, this is hard if we don't go via C code as an
>> intermediate language, and probably even if we do. But we can probably
>> at least map PC address ranges to function names and line numbers,
>> stuff like that. Maybe we could do the more advanced stuff one format
>> at a time, starting with DWARF.
> We should be able to do this already; given that we map bytecode address
> ranges to line numbers, and the function is on the stack still you you
> can query it for whatever you like. Adding a map when generating native
> code should be easy.
I think for best results with GDB and other debuggers, it should be converted
into whatever the native format is, DWARF or otherwise.
> I would actually like to switch our compiled-code on-disk format to be a
> subset of ELF, so we can have e.g. a bytecode section, a native code
> section, sections for RO and RW data, etc. But that would take a fair
> amount of thinking.
And if it's actually compatible with ELF, would make special handling of
compiled Scheme + compiled C possible on ELF platforms but not others, leading
to two different ways of potentially building stuff (or, people supporting only
ELF platforms in their packages, whether intentionally or not; or, people not
bothering using the non-portable special handling). Which is why I was
suggesting native formats rather than ELF specifically -- more work up front,
but more uniform treatment of platforms in the build process.
>> * With some special compile-time hooks, perhaps FFI symbol references
>> could turn into (weak?) direct symbol references, processed with
>> native relocation handling, etc.
> This might improve startup times (marginally?), but it wouldn't affect
> runtimes, would it?
Depending how it's done, it might improve the first reference to a symbol very
slightly. You could (again, depending how it's done) perhaps trigger link-time
errors if a developer forgets to supply libraries defining symbols the Scheme
code knows will be required, instead of a delayed run-time error.
>> * Even for JIT compilation, but especially for AOT compilation,
>> optimizations should only be enabled with careful consideration of
>> concurrent execution. E.g., if "(while (not done) ....)" is supposed
>> to work with a second thread altering "done", you may not be able to
>> combine multiple cases of reading the value of any variable even when
>> you can prove that the current thread doesn't alter the value in
> Fortunately, Scheme programming style discourages global variables ;)
> Reminds me of "spooky action at a distance". And when they are read, it
> is always through an indirection, so we should be good.
Who said global? It could be two procedures accessing a value in a shared
outer scope, with one of them launched in a second thread, perhaps indirectly
via a third procedure which the compiler couldn't examine at the time to know
that it would create a thread.
I'm not sure indirection helps -- unless you mean it disables that sort of
> Of course. Sandboxed code of course should not have access to mutexes or
> the FFI or many other things. Though it is an interesting point, that
> resources that you provide to sandboxed code should be threadsafe, if
> the sandbox itself has threads.
Actually, I'm not sure that mutexes should be forbidden, especially if you let
the sandbox create threads. But they should be well-protected, bullet-proof
mutexes; none of this "undefined behavior" stuff. :-)
>> * Link compiled C and Scheme parts of a package together into a single
>> shared library object, [....]
> This is all very hard stuff!
Maybe somewhat. The "big char array" transformation wouldn't be that hard, I
think, though we'd clearly be going outside the bounds of what a C99 compiler
is *required* to support in terms of array size. Slap a C struct wrapper on it
(or C++, which would give you an encoding system for multiple names in a
hierarchy, though with different character set limitations), and you've
basically got an object file ready to be created. Then you just have to teach
libguile how not to read files for some modules.
>> * Can anything remotely reasonable happen when C++ code calls Scheme
>> code which calls C++ code ... with stack-unwinding cleanup code
>> specified in both languages, and an exception is raised? [....]
> I have no earthly idea :)
It only just occurred to me. It may be worth looking at the C++ plus Java
case, and see if something reasonable happens there, especially with the GNU
tools in particular.
My hunch is that we might be able to do it, but would need to compile at least
a little C++ code into the library to do it portably. That wouldn't be hard,
as I doubt there are many platforms where you get a C but not C++ compiler
these days, but I don't know if the C++ ABI work has progressed far enough
along that it wouldn't tie us to a specific C++ implementation (on platforms
with more than one), or how much of an issue that would be. Worst case, we
might have to run all the libguile code through the C++ compiler, to get
stack-unwinding data recorded for EH processing; while there's a fairly large
common subset of C and C++, it would still be annoying.
It might have to be crude, too -- for example, maybe on the C++ side we'd
define a "Scheme exception" type that normally would not be caught specially by
application code (though it could be), and perhaps Scheme wouldn't be able to
catch C++ exceptions at all, just do the unwinding.
Just an idea to keep in mind....
>> Looking forward to Emacs work:
>> Tom Tromey recently pointed out some JIT compilation work done on
>> Emacs byte code back in 2004, with the conclusion that while some
>> improvement is possible, the time spent in existing primitives
>> dominates the execution time. Playing devil's advocate for a minute:
>> Why do you think we can do better? Or was this modest improvement --
>> maybe a bit more for AOT compilation -- all you were expecting when
>> you said we could run elisp faster than Emacs?
> Better for emacs? Well I don't think we should over-sell speed, if
> that's what you're getting at.
Hey, you're the one who said, "Guile can implement Emacs Lisp better than Emacs
can." :-) And specifically said that Emacs using Guile would be faster.
> Bytecode-wise, the performace will
> probably be the same. I suspect the same code in Scheme will run faster
> than Elisp, due to lexical scoping, and a richer set of bytecode
> primitives. But I think the goal for phase 1 should be "no one will
> notice" ;-)
The initial work, at least, wouldn't involve a rewrite of Lisp into Scheme. So
we still need to support dynamic scoping of, well, just about anything.
> Native-code compilation will make both Scheme and Elisp significantly
> faster -- I think 4x would be a typical improvement, though one would
> find 2x and 20x as well.
For raw Scheme data processing, perhaps. Like I said, I'm concerned about how
much of the performance of Emacs is tied to that of the Emacs C code
(redisplay, buffer manipulation, etc) and that part probably wouldn't improve
much if at all. So a 4x speedup of actual Emacs Lisp code becomes ... well, a
much smaller speedup of Emacs overall.
> More broadly, though, I don't really believe in the long-term health of
> a system that relies on primitives for speed, because such a system
> necessarily restricts the expressive power of the extension language.
> There are many things you just can't do in Emacs these days -- and
> sometimes it's things as basic as "display all of the messages in my
> archived folder". Making the extension language more capable allows for
> more programs to be written inside Emacs. Eventually we will even
> migrate many of the primitives out of C, and back into Elisp or Scheme.
I'd like to see that, and I think many Emacs developers would as well.
>> On my reasonably fast Mac desktop, Emacs takes about 3s to launch and
>> load my .emacs file.
> How long does emacs -Q take?
Maybe about 1s less?
>> During the build, pre-loading the Lisp code takes it about another 3s,
>> that would get added to the startup time without unexec. If loading
>> native compiled files (or .go files on platforms where we don't have
>> native compilation yet) isn't so amazingly fast as to cut that down to
>> 2-3s, do you have any ideas how we might be able to load and save an
>> initialized Lisp environment?
> I think we'll just have to see, unfortunately. Currently our mess of .go
> files everywhere means that you get significantly different numbers
> depending on whether you're in the disk cache or not... Perhaps we can
> make a quick estimate just based on KSLOC? How many KSLOC get loaded in
> a base emacs -Q ?
Not sure, I'll take a look.
>> I'm also pondering loading different Lisp files in two or three
>> threads in parallel, when dependencies allow, but any manipulation of
>> global variables has to be handled carefully, as do any load-time
>> errors. (One thread blocks reading, while another executes
>> already-loaded code... maybe more, to keep multiple cores busy at
> This is a little crazy ;-)
Only a little?
Modern machines are going more and more in the multicore direction. Even
without that, often a thread blocks waiting to read stuff off disk, while
another could continue doing work. Why should my Emacs startup stall waiting
on the disk any more than it absolutely needs to? (Also, POSIX has async file
i/o routines now, so "prefetching" the file contents is also an option;
conceptually in the same thread, though it could be implemented with extra
threads under the covers.)