[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
thoughts on targetting the web
From: |
Andy Wingo |
Subject: |
thoughts on targetting the web |
Date: |
Sat, 19 Jun 2021 22:20:20 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) |
Hi :)
A brain-dump tonight. I was thinking about what it would mean for Guile
to target the web. The goal would be for Guile to be a useful language
for programming client-side interaction on web pages. Now, there are a
number of takes for Schemes of various stripes to target JS. I haven't
made a full survey but my starting point is that in a Guile context, we
basically want the Guile Scheme language, which is to say:
1. Tail calls. The solution must include proper tail calls.
2. Multiple values. The solution should have the same semantics that
Guile has for multiple-value returns, including implicit
truncation.
3. Varargs. The solution should allow for rest arguments, keyword
arguments, and the like.
4. Delimited continuations. The solution should allow non-local exit
from any function, and should allow for capturing stack slices to
partial continuations, and should allow those continuations to be
instantiated multiple times if desired. We should be able to target
fibers to the web. Otherwise, why bother?
5. Garbage collection. *We should re-use the host GC*. Although it
would be possible to manage a heap in linear memory, that has
retention problems due to cycles between the Guile heap and the JS
heap.
6. Numbers. We should support Scheme's unbounded integers. Requires
bignums in the limit case, but thankfully JS has bignums now. We
will still be able to unbox in many cases though.
7. Target existing web. We shouldn't depend on some future WebAssembly
or JS proposal -- the solution should work in the here and now and
only get better as features are added to the web platform.
>From a UX perspective, I would expect we would generally want
whole-program compilation with aggressive DCE / tree-shaking, producing
a single JS or WebAssembly artifact at build-time. But that's a later
issue.
I have thought about this off and on over the years but in the end was
flummoxed about how to meet all requirements. However recently I think
I have come up with a solution for most of these:
(1) In JS, tail calls are part of ES2015, but not implemented in all
browsers. In WebAssembly, they are a future plan, but hard for
various reasons. So in practice the solution for today's web is to
use a universal trampoline and make all calls tail calls --
i.e. call all functions via:
while true:
call(pop())
Whenever we can target proper tail calls, this will only get
faster.
(2) Neither JS nor WebAssembly have the truncation semantics we want.
Therefore we will return values via an explicit array, and then the
continuation will be responsible for consuming those values and
binding any needed variables.
(3) Although JS does have varargs support, WebAssembly does not. But
we can actually use the same solution here as we use for (1) and
(2) -- pass arguments on an explicit array + nvalues, and relying
on the function to parse them appropriately. In this way we can
get Guile keyword arguments. This also makes the type of Scheme
functions uniform, which is important in WebAssembly contexts.
(4) This is the interesting bit! As hinted in (1), we will transform
the program such that all calls are tail calls. This is a form of
minimal CPS conversion -- minimal in the sense that functions are
broken up into the minimum number of pieces to ensure the
all-tail-calls property. Non-tail calls transform to tail calls by
saving their continuation's state on a stack, which is the same as
stack allocation for these continuations. The continuation of the
call pops the saved state from the stack. Because the stack is
explicit, we can manipulate it as data: slice it to capture a
delimited continuation, drop values to make a non-local exit, splat
a saved slice back on to compose a captured delimited continuation
with the current continuation, and so on. Therefore we have the
necessary primitives to implement delimited continuations as a
library.
(5) Scheme needs a representation that can store any value. In JS this
is easy because JS is untyped. For WebAssembly, I think I would
lean on externref for this purpose, which effectively boxes all
values. There are no fixnums in the current WebAssembly spec, so
this is suboptimal, and we have to call out to JS to evaluate type
predicates and access fields. But, support for structured
GC-managed types is coming to WebAssembly, which will speed up
object access.
(6) The easy solution here is to make JS numbers, which are doubles at
heart, represent flonums, and use JS bignums for Scheme integers.
Fine.
(7) This principle has been taken into account in (1)-(6).
Now, a note on the transformation described in (4), which I call
"tailification".
The first step of tailification computes the set of "tails" in a
function. The function entry starts a tail, as does each return point
from non-tail calls. Join points between different tails also start
tails.
In the residual program, there are four ways that a continuation exits:
- Tail calls in the source program are tail calls in the residual
program; no change.
- For non-tail calls in the source program, the caller saves the state
of the continuation (the live variables flowing into the
continuation) on an explicit stack, and saves the label of the
continuation. The return continuation will be converted into a
arity-checking function entry, to handle multi-value returns; when
it is invoked, it will pop its incoming live variables from the
continuation stack.
- Terms that continue to a join continuation are converted to label
calls in tail position, passing the state of the continuation as
arguments.
- Returning values from a continuation pops the return label from the
stack and does an indirect tail label call on that label, with the
given return values.
I expect that a tailified program will probably be slower than a
non-tailified program. However a tailified program has a few
interesting properties: the stack is packed and only contains live data;
the stack can be traversed in a portable way, allowing for
implementation of prompts on systems that don't support them natively;
and as all calls are tail calls, the whole system can be implemented
naturally with a driver trampoline on targets that don't support tail
calls (e.g. JavaScript and WebAssembly).
This algorithm is implemented in the wip-tailify branch in Guile's git.
I have this little driver program:
(use-modules (system base compile)
(system base language)
(language cps tailify)
(language cps verify)
(language cps renumber)
(language cps dce)
(language cps simplify)
(language cps dump))
(define (tailify* form)
(let ((from 'scheme)
(make-lower (language-lowerer (lookup-language 'cps)))
(optimization-level 2)
(warning-level 2)
(opts '()))
(define cps
((make-lower optimization-level opts)
(compile form #:from 'scheme #:to 'cps #:opts opts
#:optimization-level optimization-level
#:warning-level warning-level
#:env (current-module))
(current-module)))
(format #t "Original:\n")
(dump cps)
(format #t "\n\nTailified:\n")
(let* ((cps (tailify cps))
(cps (verify cps))
(cps (eliminate-dead-code cps))
(cps (simplify cps))
(cps (renumber cps)))
(dump cps))))
If I run this on the following lambda:
(lambda (x)
(pk x
(if x
(pk 12)
(pk 42))))
Then first it prints the dump of the optimized CPS:
Original:
L0:
v0 := self
L1(...)
L1:
receive()
v1 := current-module() ; module
cache-set![0](v1)
v2 := const-fun L7 ; _
return v2
L7:
v3 := self
L8(...)
L8:
v4 := receive(x) ; x
v5 := cache-ref[(0 . pk)]() ; cached
heap-object?(v5) ? L16() : L11()
L16():
L17(v5)
L11():
v6 := cache-ref[0]() ; mod
v7 := const pk ; name
v8 := lookup-bound(v6, v7) ; var
cache-set![(0 . pk)](v8)
L17(v8)
L17(v9): ; box
v10 := scm-ref/immediate[(box . 1)](v9) ; arg
false?(v4) ? L22() : L19()
L22():
v12 := const 42 ; arg
call v10(v12)
L25(receive(arg, rest...))
L19():
v11 := const 12 ; arg
call v10(v11)
L25(receive(arg, rest...))
L25(v13, v14): ; tmp, tmp
tail call v10(v4, v13)
Then it prints the tailified version:
L0:
v0 := self
L1(...)
L1:
receive()
v1 := current-module() ; module
cache-set![0](v1)
v2 := const-fun L8 ; _
v3 := restore1[ptr]() ; ret
tail calli v3(v2)
L8:
v4 := self
L9(...)
L9:
v5 := receive(x) ; x
v6 := cache-ref[(0 . pk)]() ; cached
heap-object?(v6) ? L17() : L12()
L17():
L18(v6)
L12():
v7 := cache-ref[0]() ; mod
v8 := const pk ; name
v9 := lookup-bound(v7, v8) ; var
cache-set![(0 . pk)](v9)
L18(v9)
L18(v10): ; box
v11 := scm-ref/immediate[(box . 1)](v10) ; arg
false?(v5) ? L24() : L20()
L24():
v14 := const 42 ; arg
v15 := code L37 ; cont
save[(scm scm ptr)](v5, v11, v15)
tail call v11(v14)
L20():
v12 := const 12 ; arg
v13 := code L29 ; cont
save[(scm scm ptr)](v5, v11, v13)
tail call v11(v12)
L29:
L30(...)
L30:
v16, v17 := receive(arg, rest...) ; tmp, tmp
v18, v19 := restore[(scm scm)]() ; restored, restored
tail callk L34(_, v18, v19, v16)
L34:
meta: arg-representations: (scm scm scm)
L35(...)
L35(v20, v21, v22): ; _, _, _
tail call v21(v20, v22)
L37:
L38(...)
L38:
v23, v24 := receive(arg, rest...) ; tmp, tmp
v25, v26 := restore[(scm scm)]() ; restored, restored
tail callk L34(_, v25, v26, v23)
I think I would like to implement the "save" and "restore" primcalls on
stock Guile, to be able to adequately test tailification. But then I'd
look at targetting WebAssembly (just to make it interesting; run-time
performance would be faster no doubt if I targetted JS, due to GC
concerns).
Anyway, thoughts welcome. Happy hacking :)
Andy
- thoughts on targetting the web,
Andy Wingo <=