thoughts on targetting the web

guile-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
thoughts on targetting the web

From:	Andy Wingo
Subject:	thoughts on targetting the web
Date:	Sat, 19 Jun 2021 22:20:20 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Hi :)

A brain-dump tonight.  I was thinking about what it would mean for Guile
to target the web.  The goal would be for Guile to be a useful language
for programming client-side interaction on web pages.  Now, there are a
number of takes for Schemes of various stripes to target JS.  I haven't
made a full survey but my starting point is that in a Guile context, we
basically want the Guile Scheme language, which is to say:

 1. Tail calls.  The solution must include proper tail calls.

 2. Multiple values.  The solution should have the same semantics that
    Guile has for multiple-value returns, including implicit
    truncation.

 3. Varargs.  The solution should allow for rest arguments, keyword
    arguments, and the like.

 4. Delimited continuations.  The solution should allow non-local exit
    from any function, and should allow for capturing stack slices to
    partial continuations, and should allow those continuations to be
    instantiated multiple times if desired.  We should be able to target
    fibers to the web.  Otherwise, why bother?

 5. Garbage collection.  *We should re-use the host GC*.  Although it
    would be possible to manage a heap in linear memory, that has
    retention problems due to cycles between the Guile heap and the JS
    heap.

 6. Numbers.  We should support Scheme's unbounded integers.  Requires
    bignums in the limit case, but thankfully JS has bignums now.  We
    will still be able to unbox in many cases though.

 7. Target existing web.  We shouldn't depend on some future WebAssembly
    or JS proposal -- the solution should work in the here and now and
    only get better as features are added to the web platform.

>From a UX perspective, I would expect we would generally want
whole-program compilation with aggressive DCE / tree-shaking, producing
a single JS or WebAssembly artifact at build-time.  But that's a later
issue.

I have thought about this off and on over the years but in the end was
flummoxed about how to meet all requirements.  However recently I think
I have come up with a solution for most of these:

 (1) In JS, tail calls are part of ES2015, but not implemented in all
     browsers.  In WebAssembly, they are a future plan, but hard for
     various reasons.  So in practice the solution for today's web is to
     use a universal trampoline and make all calls tail calls --
     i.e. call all functions via:

       while true:
         call(pop())
         
     Whenever we can target proper tail calls, this will only get
     faster.

 (2) Neither JS nor WebAssembly have the truncation semantics we want.
     Therefore we will return values via an explicit array, and then the
     continuation will be responsible for consuming those values and
     binding any needed variables.

 (3) Although JS does have varargs support, WebAssembly does not.  But
     we can actually use the same solution here as we use for (1) and
     (2) -- pass arguments on an explicit array + nvalues, and relying
     on the function to parse them appropriately.  In this way we can
     get Guile keyword arguments.  This also makes the type of Scheme
     functions uniform, which is important in WebAssembly contexts.

 (4) This is the interesting bit!  As hinted in (1), we will transform
     the program such that all calls are tail calls.  This is a form of
     minimal CPS conversion -- minimal in the sense that functions are
     broken up into the minimum number of pieces to ensure the
     all-tail-calls property.  Non-tail calls transform to tail calls by
     saving their continuation's state on a stack, which is the same as
     stack allocation for these continuations.  The continuation of the
     call pops the saved state from the stack.  Because the stack is
     explicit, we can manipulate it as data: slice it to capture a
     delimited continuation, drop values to make a non-local exit, splat
     a saved slice back on to compose a captured delimited continuation
     with the current continuation, and so on.  Therefore we have the
     necessary primitives to implement delimited continuations as a
     library.

 (5) Scheme needs a representation that can store any value.  In JS this
     is easy because JS is untyped.  For WebAssembly, I think I would
     lean on externref for this purpose, which effectively boxes all
     values.  There are no fixnums in the current WebAssembly spec, so
     this is suboptimal, and we have to call out to JS to evaluate type
     predicates and access fields.  But, support for structured
     GC-managed types is coming to WebAssembly, which will speed up
     object access.

 (6) The easy solution here is to make JS numbers, which are doubles at
     heart, represent flonums, and use JS bignums for Scheme integers.
     Fine.

 (7) This principle has been taken into account in (1)-(6).

Now, a note on the transformation described in (4), which I call
"tailification".

The first step of tailification computes the set of "tails" in a
function.  The function entry starts a tail, as does each return point
from non-tail calls.  Join points between different tails also start
tails.

In the residual program, there are four ways that a continuation exits:

  - Tail calls in the source program are tail calls in the residual
    program; no change.

  - For non-tail calls in the source program, the caller saves the state
    of the continuation (the live variables flowing into the
    continuation) on an explicit stack, and saves the label of the
    continuation.  The return continuation will be converted into a
    arity-checking function entry, to handle multi-value returns; when
    it is invoked, it will pop its incoming live variables from the
    continuation stack.

  - Terms that continue to a join continuation are converted to label
    calls in tail position, passing the state of the continuation as
    arguments.

  - Returning values from a continuation pops the return label from the
    stack and does an indirect tail label call on that label, with the
    given return values.

I expect that a tailified program will probably be slower than a
non-tailified program.  However a tailified program has a few
interesting properties: the stack is packed and only contains live data;
the stack can be traversed in a portable way, allowing for
implementation of prompts on systems that don't support them natively;
and as all calls are tail calls, the whole system can be implemented
naturally with a driver trampoline on targets that don't support tail
calls (e.g. JavaScript and WebAssembly).

This algorithm is implemented in the wip-tailify branch in Guile's git.

I have this little driver program:

    (use-modules (system base compile)
                 (system base language)
                 (language cps tailify)
                 (language cps verify)
                 (language cps renumber)
                 (language cps dce)
                 (language cps simplify)
                 (language cps dump))

    (define (tailify* form)
      (let ((from 'scheme)
            (make-lower (language-lowerer (lookup-language 'cps)))
            (optimization-level 2)
            (warning-level 2)
            (opts '()))
        (define cps
          ((make-lower optimization-level opts)
           (compile form #:from 'scheme #:to 'cps #:opts opts
                    #:optimization-level optimization-level
                    #:warning-level warning-level
                    #:env (current-module))
           (current-module)))
        (format #t "Original:\n")
        (dump cps)
        (format #t "\n\nTailified:\n")
        (let* ((cps (tailify cps))
               (cps (verify cps))
               (cps (eliminate-dead-code cps))
               (cps (simplify cps))
               (cps (renumber cps)))
          (dump cps))))

If I run this on the following lambda:

  (lambda (x)
    (pk x
        (if x
            (pk 12)
            (pk 42))))

Then first it prints the dump of the optimized CPS:

  Original:
  L0:
    v0 := self
    L1(...)
  L1:
    receive()
    v1 := current-module()                      ; module 
    cache-set![0](v1)
    v2 := const-fun L7                          ; _ 
    return v2

  L7:
    v3 := self
    L8(...)
  L8:
    v4 := receive(x)                            ; x 
    v5 := cache-ref[(0 . pk)]()                 ; cached 
    heap-object?(v5) ? L16() : L11()
  L16():
    L17(v5)
  L11():
    v6 := cache-ref[0]()                        ; mod 
    v7 := const pk                              ; name 
    v8 := lookup-bound(v6, v7)                  ; var 
    cache-set![(0 . pk)](v8)
    L17(v8)
  L17(v9):                                      ; box 
    v10 := scm-ref/immediate[(box . 1)](v9)     ; arg 
    false?(v4) ? L22() : L19()
  L22():
    v12 := const 42                             ; arg 
    call v10(v12)
    L25(receive(arg, rest...))
  L19():
    v11 := const 12                             ; arg 
    call v10(v11)
    L25(receive(arg, rest...))
  L25(v13, v14):                                ; tmp, tmp 
    tail call v10(v4, v13)

Then it prints the tailified version:

  L0:
    v0 := self
    L1(...)
  L1:
    receive()
    v1 := current-module()                      ; module 
    cache-set![0](v1)
    v2 := const-fun L8                          ; _ 
    v3 := restore1[ptr]()                       ; ret 
    tail calli v3(v2)

  L8:
    v4 := self
    L9(...)
  L9:
    v5 := receive(x)                            ; x 
    v6 := cache-ref[(0 . pk)]()                 ; cached 
    heap-object?(v6) ? L17() : L12()
  L17():
    L18(v6)
  L12():
    v7 := cache-ref[0]()                        ; mod 
    v8 := const pk                              ; name 
    v9 := lookup-bound(v7, v8)                  ; var 
    cache-set![(0 . pk)](v9)
    L18(v9)
  L18(v10):                                     ; box 
    v11 := scm-ref/immediate[(box . 1)](v10)    ; arg 
    false?(v5) ? L24() : L20()
  L24():
    v14 := const 42                             ; arg 
    v15 := code L37                             ; cont 
    save[(scm scm ptr)](v5, v11, v15)
    tail call v11(v14)
  L20():
    v12 := const 12                             ; arg 
    v13 := code L29                             ; cont 
    save[(scm scm ptr)](v5, v11, v13)
    tail call v11(v12)

  L29:
    L30(...)
  L30:
    v16, v17 := receive(arg, rest...)           ; tmp, tmp 
    v18, v19 := restore[(scm scm)]()            ; restored, restored 
    tail callk L34(_, v18, v19, v16)

  L34:
    meta: arg-representations: (scm scm scm)
    L35(...)
  L35(v20, v21, v22):                           ; _, _, _ 
    tail call v21(v20, v22)

  L37:
    L38(...)
  L38:
    v23, v24 := receive(arg, rest...)           ; tmp, tmp 
    v25, v26 := restore[(scm scm)]()            ; restored, restored 
    tail callk L34(_, v25, v26, v23)

I think I would like to implement the "save" and "restore" primcalls on
stock Guile, to be able to adequately test tailification.  But then I'd
look at targetting WebAssembly (just to make it interesting; run-time
performance would be faster no doubt if I targetted JS, due to GC
concerns).

Anyway, thoughts welcome.  Happy hacking :)

Andy
[Prev in Thread]
Current Thread
[Next in Thread]
thoughts on targetting the web, Andy Wingo <=
- Re: thoughts on targetting the web, Chris Lemmer-Webber, 2021/06/19
- Re: thoughts on targetting the web, Maxime Devos, 2021/06/20
  - Re: thoughts on targetting the web, Chris Lemmer-Webber, 2021/06/20
    - Re: thoughts on targetting the web, Maxime Devos, 2021/06/20
Prev by Date: Fwd: guile style
Next by Date: Re: thoughts on targetting the web
Previous by thread: Fwd: guile style
Next by thread: Re: thoughts on targetting the web
Index(es):
- Date
- Thread