[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

vm status update, a.k.a. yowsers batman it's february already

From: Andy Wingo
Subject: vm status update, a.k.a. yowsers batman it's february already
Date: Mon, 02 Feb 2009 21:28:46 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux)

Greets greets,

An update from the wilds of the vm branch is overdue. So here we go:

 * Opcodes are now numbered statically in the source. This should make
   it easier to maintain bytecode compatibility in the future.

 * The VM now has two new languages:

   - Bytecode, which is just object code as u8vectors; and

   - Assembly, which is between bytecode and GLIL.

   The differences may be seen thusly (hand-pretty-printed):

     scheme@(guile-user)> (compile '(car '(a . b)) #:to 'glil)
     #<glil (program 0 0 0 0 ()
                 (const (a . b))
                 (call car 1)
                 (call return 0))>
     scheme@(guile-user)> (compile '(car '(a . b)) #:to 'assembly)
     (load-program 0 0 0 0 () 13 #f
        (load-symbol "a")
        (load-symbol "b")
     scheme@(guile-user)> (compile '(car '(a . b)) #:to 'bytecode)
     #u8(0 0 0 0 13 0 0 0 0 0 0 0 63 0 0 1 97 63 0 0 1 98 90 91 48)
     scheme@(guile-user)> (compile '(car '(a . b)) #:to 'objcode)
     #<objcode b728e450>

 * As you can see, the bytecode header is quite long -- it's 12 bytes
   before we get to the meat of the program. (That's 4 for arity, 4 for
   length, and 4 for meta-length -- more on that in a minute). But this
   is OK, because normally this is read-only code mmapped directly from
 * Originally, when loading programs with meta-data (such as source
   information), we had to load all of that metadata up along with the
   program -- symbols, vectors, conses, etc. So then we hid that loading
   behind a thunk, so we just had to cons up a thunk -- but still that
   was 8 words (4 for the program and 4 for the object code).

   So instead now we just stick the meta-thunk after the main program
   text, and load it only when objcode-meta (or program-meta) is called.
   Voici source information without cost! I stole this trick from the
   Self compiler.

 * Just as we have a tower compilers (and thus languages), we now have a
   tower of /decompilers/. Currently I've only implemented
   value->objcode (only valid for values of type program or objcode),
   objcode->bytecode, and bytecode->assembly, but it's possible to
   implement passes decompiling all the way back to Scheme.

   Or JavaScript! That's the crazy thing: since multiple languages exist
   on top of one substrate, decompilers allow us to do language
   translation -- what Guile originally wanted to do, but as artifact
   rather than as mechanism.

 * Because we put the 4-byte lengths in the objcode directly, and mmap
   that data, bytecode is now endian-specific. Specifically, it's all
   little-endian right now. I know, I know. Worse, it's not aligned. But
   provisions are there to make it aligned and native endian.


So, what's up?

Well, things are good. Load time is slightly faster, though we still can
be significantly faster. We cons less than the evaluator. Things are
looking good, and improvable.

I have to fix the endianness/alignment bits.

There are two main regressions. One is a simple bug: backtraces aren't
working right unless you have VM code. I think it's a simple problem
with stack cutting, I have to poke it a bit.

Secondly, GOOPS loads *really slowly*, because of the dynamic
recompilation things that I thought were so clever. I don't know exactly
what to do yet -- profile and see, I guess. I think this is my first
priority right now.

As far as improvements go, there's a laundry list:

  * I'm going to try bytecodes being uint32's instead of uint8's. We'll
    see what the performance impacts are.

  * I'm going to see about coalescing object tables into one vector per
    compilation unit, e.g. file. This should result in faster startup

  * GOOPS needs some love. I think polymorphic inline caches are the way
    to go, and might be a first way to test out a native-code generator
    (for the cache stubs).

  * It would be nice to have syncase in by default, though perhaps we
    should leave this for R6RS.

  * Decompilers to GLIL, GHIL, and Scheme would be *sweet*.

  * I think there's something publishable in all of this language tower
    business, but I'd need a convincing second high-level language. I
    think JavaScript is the right one. We need to write a compiler to
    GHIL, and probably extend the VM slightly.

    With luck, I'd like to present this fall at the SFP -- any takers
    for help? We could present together :)

Well, that's what's on my mind for now. I'll work at updating the docs
soon, funny to have them bitrot so quickly.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]