[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: a plan for native compilation
Re: a plan for native compilation
Sun, 18 Apr 2010 13:41:27 +0200
Gnus/5.13 (Gnus v5.13) Emacs/23.0.92 (gnu/linux)
You bring up lots of interesting points. Here are my initial reactions
to some of them.
On Sun 18 Apr 2010 04:19, Ken Raeburn <address@hidden> writes:
> On Apr 16, 2010, at 07:09, Andy Wingo wrote:
>> Currently, Guile has a compiler to a custom virtual machine, and the
>> associated toolchain: assemblers and disassemblers, stack walkers, the
>> debugger, etc. One can get the source location of a particular
>> instruction pointer, for example.
> These are great... but if they're run-time features of Guile, they're
> useless when examining a core file.
I don't think we should be thinking in terms of core files; core files
are corpses, whereas a Guile process is alive ;-)
Specifically, we should make it so that there is nothing you would want
to go to a core file for. Compiling Scheme code to native code should
never produce code that segfaults at runtime. All errors would still be
handled by the catch/throw mechanism.
So in a sense, native-compiled procedures are black boxes. All you need
to be able to do is to walk the stack -- which should have the same
representation for native-compiled and bytecode procedures -- and get
some representation of where you are in that procedure. I am hoping to
be able to treat the instruction pointer simply as an index into the
live variable set and source location array, so that we don't require
disassembly of native code for proper backtraces.
Of course, native disassembly is interesting sometimes, so we should
support it when we can, but it shouldn't be necessary to disassemble
instructions for all of the architectures supported by libjit or
That said, it's true that if on the off chance you were to get a core
file from Guile, you would like to be able to understand it ;) So I see
your point here. I think ideally it would be a GCC plugin that linked to
libguile somehow, at least to get the typecodes and such. Lots of
interesting stuff to do here!
> Would you generate code based on the debug or non-debug versions of
> the instructions? What would the choice depend on? Can both bytecode
> evaluators be used in one process and with the same bytecode object?
Options to the compiler would be one way. Then there are the typical
(declare ...) blocks, both at the top-level and within procedures.
As an aside currently it's tough (almost impossible) to switch from the
debug-instrumented VM to the non-debugging one. You basically have to
select it at compile time; in the future we can allow selecting it when
you start Guile, but switching at runtime will have to wait for a
delimited call/cc, methinks.
> What about when profiling for performance?
Profiling with statprof should still work, because the stack
representation will be the same.
> Does the native code figure out if it's jumping to byte code or
> machine code, or does it use some transfer stub?
It has to do some checking anyway, to see if the procedure being called
is actually a procedure, so it will just fold in the check that the
procedure has native code. If the procedure has no native code or is not
a procedure, the native code jumps back into the body of e.g.
vm_debug_engine, after restoring registers. (I think?)
> Several possible options for AOT compilation (e.g., generating C or
> assembly and using native tools)
I would really like to avoid generating C. It seems to be to be a
needless abstraction layer, given that we will use our own calling
convention and stack. It's actually too high of an abstraction layer
Hmm, another option would be to write a GCC plugin and feed it
> * Debug info in native representations, handled by GDB and other
> debuggers. Okay, this is hard if we don't go via C code as an
> intermediate language, and probably even if we do. But we can probably
> at least map PC address ranges to function names and line numbers,
> stuff like that. Maybe we could do the more advanced stuff one format
> at a time, starting with DWARF.
We should be able to do this already; given that we map bytecode address
ranges to line numbers, and the function is on the stack still you you
can query it for whatever you like. Adding a map when generating native
code should be easy.
> * Code and read-only data sections shared across processes; read-write
> data mapped in copy-on-write.
Read-only sharing does happen already with .go files. Copy-on write does
not happen yet.
I would actually like to switch our compiled-code on-disk format to be a
subset of ELF, so we can have e.g. a bytecode section, a native code
section, sections for RO and RW data, etc. But that would take a fair
amount of thinking.
> * With some special compile-time hooks, perhaps FFI symbol references
> could turn into (weak?) direct symbol references, processed with
> native relocation handling, etc.
This might improve startup times (marginally?), but it wouldn't affect
runtimes, would it?
> * Linking multiple object files together into a single "library"
> object that can be loaded at once; possibly with cross-file
Dunno; for me, cross-file optimization should happen at macroexpansion
time via define-integrable, or via similar approaches. But linking
together a number of modules into one file could be advantageous in e.g.
the emacs case, to avoid unexec.
> * Even for JIT compilation, but especially for AOT compilation,
> optimizations should only be enabled with careful consideration of
> concurrent execution. E.g., if "(while (not done) ....)" is supposed
> to work with a second thread altering "done", you may not be able to
> combine multiple cases of reading the value of any variable even when
> you can prove that the current thread doesn't alter the value in
Fortunately, Scheme programming style discourages global variables ;)
Reminds me of "spooky action at a distance". And when they are read, it
is always through an indirection, so we should be good.
> ** Be especially careful if you want to be able to have Guile create a
> limited sandbox in which to run untrusted code. Assume that the
> provider of the code will attempt to avoid mutexes and use race
> conditions and FFI pointer handling and opportunities for data
> corruption and such, in order to break out of the sandbox.
Of course. Sandboxed code of course should not have access to mutexes or
the FFI or many other things. Though it is an interesting point, that
resources that you provide to sandboxed code should be threadsafe, if
the sandbox itself has threads.
> * Link compiled C and Scheme parts of a package together into a single
> shared library object, instead of the code in one language needing to
> know where the object for the other language is (awkward if you're
> trying to relocate the whole bundle via LD_LIBRARY_PATH) and
> explicitly load it. (Perhaps a library initialization function could
> call a Guile library function to say, "if you need module (foo bar
> baz), it's mapped in at this address and is this big, and this much is
> read-only", or "here's a pointer to the struct Foo describing it,
> including pointers to various components". Or we could generate C
> symbols reflecting module names and make the library explicitly make
> them known to the Guile library.) If nothing else, the current .go
> file could be turned into a large character array....
This is all very hard stuff!
> * Can anything remotely reasonable happen when C++ code calls Scheme
> code which calls C++ code ... with stack-unwinding cleanup code
> specified in both languages, and an exception is raised? Can the
> cleanup code be invoked in both languages? (This applies to the
> bytecode interpreter as well, but the mechanism for compiled code
> would have to be different, as I believe C++/Ada/etc EH support
> typically maps PC address to handler info; I don't know how Java is
> handled under JIT compilation.)
I have no earthly idea :)
> Looking forward to Emacs work:
> Tom Tromey recently pointed out some JIT compilation work done on
> Emacs byte code back in 2004, with the conclusion that while some
> improvement is possible, the time spent in existing primitives
> dominates the execution time. Playing devil's advocate for a minute:
> Why do you think we can do better? Or was this modest improvement --
> maybe a bit more for AOT compilation -- all you were expecting when
> you said we could run elisp faster than Emacs?
Better for emacs? Well I don't think we should over-sell speed, if
that's what you're getting at. Bytecode-wise, the performace will
probably be the same. I suspect the same code in Scheme will run faster
than Elisp, due to lexical scoping, and a richer set of bytecode
primitives. But I think the goal for phase 1 should be "no one will
Native-code compilation will make both Scheme and Elisp significantly
faster -- I think 4x would be a typical improvement, though one would
find 2x and 20x as well.
More broadly, though, I don't really believe in the long-term health of
a system that relies on primitives for speed, because such a system
necessarily restricts the expressive power of the extension language.
There are many things you just can't do in Emacs these days -- and
sometimes it's things as basic as "display all of the messages in my
archived folder". Making the extension language more capable allows for
more programs to be written inside Emacs. Eventually we will even
migrate many of the primitives out of C, and back into Elisp or Scheme.
> But I'm still concerned about [loading init code] at startup time
> rather than using the "unexec" mechanism Emacs currently uses to
> pre-initialize all the C and Lisp stuff and dump out an image that can
> be launched more quickly.
Yeah, understood. We need to figure out what the status is with this.
> On my reasonably fast Mac desktop, Emacs takes about 3s to launch and
> load my .emacs file.
How long does emacs -Q take?
> During the build, pre-loading the Lisp code takes it about another 3s,
> that would get added to the startup time without unexec. If loading
> native compiled files (or .go files on platforms where we don't have
> native compilation yet) isn't so amazingly fast as to cut that down to
> 2-3s, do you have any ideas how we might be able to load and save an
> initialized Lisp environment?
I think we'll just have to see, unfortunately. Currently our mess of .go
files everywhere means that you get significantly different numbers
depending on whether you're in the disk cache or not... Perhaps we can
make a quick estimate just based on KSLOC? How many KSLOC get loaded in
a base emacs -Q ?
> One thing that might speed up the loading of .go files is making them
> more compact; there seems to be a lot of string duplication in the
> current format. (Try running "strings module/ice-9/boot-9.go | sort |
> uniq -c | sort -n" to get a list of strings and the numbers of times
> they appear, sorted by count.)
Interesting. You're probably right here. I think our bytecode files need
separate RO and RW data sections.
> I'm also pondering loading different Lisp files in two or three
> threads in parallel, when dependencies allow, but any manipulation of
> global variables has to be handled carefully, as do any load-time
> errors. (One thread blocks reading, while another executes
> already-loaded code... maybe more, to keep multiple cores busy at
This is a little crazy ;-)
> ... Sorry, that's a lot of tangents to be going off onto. :-)
Heh, no prob. If we figure out how things should work, in public on the
list, it will be easier for other people to help us make it there.