[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Guile 3 update, August edition

From: Andy Wingo
Subject: Guile 3 update, August edition
Date: Mon, 20 Aug 2018 16:27:26 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)


Last dispatch was here:

To recap, I merged in GNU lightning and added an extra machine-code
return address to frames, but hadn't actually written the JIT yet.

Since July, I made it so that all Guile bytecode function entry points
start with an "instrument-entry" bytecode that holds a counter.  The
intention is that when the counter increments beyond a certain value,
the function should be automatically JIT-compiled.  Associated with the
counter is a native-code pointer corresponding to the function.  I also
added "instrument-loop" bytecodes to all loops, to be able to tier up
from within hot loops.

With all of this done and some other bytecode tweaks, I was able to move
on to the JIT compiler itself.  I'm happy to say that I now have a first
version.  It's about 3500 lines of C, so a bit gnarly.  It's
architecture-independent, as it uses lightning, and there are lightning
backends for about every architecture.  Lightning seems OK.  Not
optimal, but OK, and an OK thing to use for now anyway.  I did have to
write special cases for 32-bit machines, as Guile's VM supports 64-bit
arithmetic, and some-endian-specific code.  I probably got some of that
wrong; review is very welcome:

If you have fixes and are a committer, please feel free to just commit
them directly.  If you aren't a committer yet and you spot some fixes,
mail the list; you should definitely be a committer if you can do that

I just got the JIT working today.  For the time being, the interface is
a public function, %jit-compile.  Eventually I will remove this when I
have more confidence, relying only on the automatic compilation
triggered by function entry and loop iterations.

As an example:

  $ cat foo.scm
  (use-modules (rnrs bytevectors))
  (define (f32v-sum bv)
    (let lp ((n 0) (sum 0.0))
      (if (< n (bytevector-length bv))
          (lp (+ n 4)
              (+ sum (bytevector-ieee-single-native-ref bv n)))
  (define ones (make-f32vector #e1e7 1.0))

  # The JIT currently doesn't emit hook code.
  $ meta/guile --no-debug
  scheme@(guile-user)> (load "foo.scm")
  scheme@(guile-user)> ,time (f32v-sum ones)
  $2 = 1.0e7
  ;; 0.143017s real time, 0.142986s run time.  0.000000s spent in GC.
  scheme@(guile-user)> (%jit-compile f32v-sum)
  scheme@(guile-user)> ,time (f32v-sum ones)
  $3 = 1.0e7
  ;; 0.048514s real time, 0.048499s run time.  0.000000s spent in GC.

In this particular example, the JITted code runs about 3x faster than
the interpreted code.  The JIT doesn't do register allocation; not sure
precisely how to do that.  A future topic.  For the moment I want to
consolidate what we have and once it's all just magically working and
everybody's programs are faster, we release Guile 3.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]