guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, master, updated. release_1-9-1-66-g988


From: Andy Wingo
Subject: [Guile-commits] GNU Guile branch, master, updated. release_1-9-1-66-g98850fd
Date: Wed, 12 Aug 2009 21:37:38 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=98850fd727ad6b31ce2c4fe710935bbe9da9d966

The branch, master has been updated
       via  98850fd727ad6b31ce2c4fe710935bbe9da9d966 (commit)
      from  aaae0d5ab3d0a867b7005d1a6bf38dc345195a93 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 98850fd727ad6b31ce2c4fe710935bbe9da9d966
Author: Andy Wingo <address@hidden>
Date:   Wed Aug 12 23:38:05 2009 +0200

    update docs for recent vm/compiler work
    
    * doc/ref/compiler.texi:
    * doc/ref/vm.texi: Update for recent changes.
    * module/language/assembly/disassemble.scm (disassemble-load-program):
      Don't print nops, they are distracting.

-----------------------------------------------------------------------

Summary of changes:
 doc/ref/compiler.texi                    |  165 +++++++++-------
 doc/ref/vm.texi                          |  319 ++++++++++++++++++++---------
 module/language/assembly/disassemble.scm |    2 +
 3 files changed, 317 insertions(+), 169 deletions(-)

diff --git a/doc/ref/compiler.texi b/doc/ref/compiler.texi
index f8d0895..0aea4e7 100644
--- a/doc/ref/compiler.texi
+++ b/doc/ref/compiler.texi
@@ -17,7 +17,7 @@ This section aims to pay attention to the small man behind the
 curtain.
 
 @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
-know how to compile your .scm file.
+know how to compile your @code{.scm} file.
 
 @menu
 * Compiler Tower::                   
@@ -67,8 +67,7 @@ for Scheme:
   #:title       "Guile Scheme"
   #:version     "0.5"
   #:reader      read
-  #:compilers   `((tree-il . ,compile-tree-il)
-                  (ghil . ,compile-ghil))
+  #:compilers   `((tree-il . ,compile-tree-il))
   #:decompilers `((tree-il . ,decompile-tree-il))
   #:evaluator   (lambda (x module) (primitive-eval x))
   #:printer     write)
@@ -220,13 +219,13 @@ Note however that @code{sc-expand} does not have the same 
signature as
 around @code{sc-expand}, to make it conform to the general form of
 compiler procedures in Guile's language tower.
 
-Compiler procedures take two arguments, an expression and an
-environment. They return three values: the compiled expression, the
-corresponding environment for the target language, and a
-``continuation environment''. The compiled expression and environment
-will serve as input to the next language's compiler. The
-``continuation environment'' can be used to compile another expression
-from the same source language within the same module.
+Compiler procedures take three arguments: an expression, an
+environment, and a keyword list of options. They return three values:
+the compiled expression, the corresponding environment for the target
+language, and a ``continuation environment''. The compiled expression
+and environment will serve as input to the next language's compiler.
+The ``continuation environment'' can be used to compile another
+expression from the same source language within the same module.
 
 For example, you might compile the expression, @code{(define-module
 (foo))}. This will result in a Tree-IL expression and environment. But
@@ -292,6 +291,14 @@ tree-il@@(guile-user)> (apply (primitive +) (const 32) 
(const 10))
 
 The @code{src} fields are left out of the external representation.
 
+One may create Tree-IL objects from their external representations via
+calling @code{parse-tree-il}, the reader for Tree-IL. If any source
+information is attached to the input S-expression, it will be
+propagated to the resulting Tree-IL expressions. This is probably the
+easiest way to compile to Tree-IL: just make the appropriate external
+representations in S-expression format, and let @code{parse-tree-il}
+take care of the rest.
+
 @deftp {Scheme Variable} <void> src
 @deftpx {External Representation} (void)
 An empty expression. In practice, equivalent to Scheme's @code{(if #f
@@ -384,12 +391,29 @@ A version of @code{<let>} that creates recursive 
bindings, like
 Scheme's @code{letrec}.
 @end deftp
 
address@hidden FIXME -- need to revive this one
address@hidden @deftp {Scheme Variable} <ghil-mv-bind> src vars rest producer . 
body
address@hidden Like Scheme's @code{receive} -- binds the values returned by
address@hidden applying @code{producer}, which should be a thunk, to the
address@hidden @code{lambda}-like bindings described by @var{vars} and 
@var{rest}.
address@hidden @end deftp
+There are two Tree-IL constructs that are not normally produced by
+higher-level compilers, but instead are generated during the
+source-to-source optimization and analysis passes that the Tree-IL
+compiler does. Users should not generate these expressions directly,
+unless they feel very clever, as the default analysis pass will
+generate them as necessary.
+
address@hidden {Scheme Variable} <let-values> src names vars exp body
address@hidden {External Representation} (let-values @var{names} @var{vars} 
@var{exp} @var{body})
+Like Scheme's @code{receive} -- binds the values returned by
+evaluating @code{exp} to the @code{lambda}-like bindings described by
address@hidden That is to say, @var{vars} may be an improper list.
+
address@hidden<let-values>} is an optimization of @code{<application>} of the
+primitive, @code{call-with-values}.
address@hidden deftp
address@hidden {Scheme Variable} <fix> src names vars vals body
address@hidden {External Representation} (fix @var{names} @var{vars} @var{vals} 
@var{body})
+Like @code{<letrec>}, but only for @var{vals} that are unset
address@hidden expressions.
+
address@hidden is an optimization of @code{letrec} (and @code{let}).
address@hidden deftp
 
 Tree-IL implements a compiler to GLIL that recursively traverses
 Tree-IL expressions, writing out GLIL expressions into a linear list.
@@ -399,9 +423,9 @@ future computations. This state allows the compiler not to 
emit code
 for constant expressions that will not be used (e.g. docstrings), and
 to perform tail calls when in tail position.
 
-In the future, there will be a pass at the beginning of the
-Tree-IL->GLIL compilation step to perform inlining, copy propagation,
-dead code elimination, and constant folding.
+Most optimization, such as it currently is, is performed on Tree-IL
+expressions as source-to-source transformations. There will be more
+optimizations added in the future.
 
 Interested readers are encouraged to read the implementation in
 @code{(language tree-il compile-glil)} for more details.
@@ -411,18 +435,16 @@ Interested readers are encouraged to read the 
implementation in
 
 Guile Low Intermediate Language (GLIL) is a structured intermediate
 language whose expressions more closely approximate Guile's VM
-instruction set.
+instruction set. Its expression types are defined in @code{(language
+glil)}.
 
-Its expression types are defined in @code{(language glil)}, and as
-with GHIL, some of its fields parse as rest arguments.
-
address@hidden {Scheme Variable} <glil-program> nargs nrest nlocs nexts meta . 
body
address@hidden {Scheme Variable} <glil-program> nargs nrest nlocs meta . body
 A unit of code that at run-time will correspond to a compiled
-procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts}
-collectively define the program's arity; see @ref{Compiled
-Procedures}, for more information. @var{meta} should be an alist of
-properties, as in Tree IL's @code{<lambda>}. @var{body} is a list of
-GLIL expressions.
+procedure. @var{nargs} @var{nrest} and @var{nlocs} collectively define
+the program's arity; see @ref{Compiled Procedures}, for more
+information. @var{meta} should be an alist of properties, as in
+Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
+expressions.
 @end deftp
 @deftp {Scheme Variable} <glil-bind> . vars
 An advisory expression that notes a liveness extent for a set of
@@ -461,23 +483,21 @@ and @code{filename} keys, e.g. as returned by
 @code{source-properties}.
 @end deftp
 @deftp {Scheme Variable} <glil-void>
-Pushes the unspecified value on the stack.
+Pushes ``the unspecified value'' on the stack.
 @end deftp
 @deftp {Scheme Variable} <glil-const> obj
 Pushes a constant value onto the stack. @var{obj} must be a number,
-string, symbol, keyword, boolean, character, the empty list, or a pair
-or vector of constants.
address@hidden deftp
address@hidden {Scheme Variable} <glil-local> op index
-Accesses a lexically bound variable from the stack. If @var{op} is
address@hidden, the value is pushed onto the stack; if it is @code{set},
-the variable is set from the top value on the stack, which is popped
-off. @xref{Stack Layout}, for more information.
+string, symbol, keyword, boolean, character, uniform array, the empty
+list, or a pair or vector of constants.
 @end deftp
address@hidden {Scheme Variable} <glil-external> op depth index
-Accesses a heap-allocated variable, addressed by @var{depth}, the nth
-enclosing environment, and @var{index}, the variable's position within
-the environment. @var{op} is @code{ref} or @code{set}.
address@hidden {Scheme Variable} <glil-lexical> local? boxed? op index
+Accesses a lexically bound variable. If the variable is not
address@hidden it is free. All variables may have @code{ref} and
address@hidden as their @var{op}. Boxed variables may also have the
address@hidden @code{box}, @code{empty-box}, and @code{fix}, which
+correspond in semantics to the VM instructions @code{box},
address@hidden, and @code{fix-closure}. @xref{Stack Layout}, for
+more information.
 @end deftp
 @deftp {Scheme Variable} <glil-toplevel> op name
 Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
@@ -520,7 +540,7 @@ Guile Lowlevel Intermediate Language (GLIL) interpreter 0.3 
on Guile 1.9.0
 Copyright (C) 2001-2008 Free Software Foundation, Inc.
 
 Enter `,help' for help.
-glil@@(guile-user)> (program 0 0 0 0 () (const 3) (call return 0))
+glil@@(guile-user)> (program 0 0 0 () (const 3) (call return 1))
 @result{} 3
 @end example
 
@@ -542,12 +562,12 @@ differs from GLIL in four main ways:
 @itemize
 @item Labels have been resolved to byte offsets in the program.
 @item Constants inside procedures have either been expressed as inline
-instructions, and possibly cached in object arrays.
+instructions or cached in object arrays.
 @item Procedures with metadata (source location information, liveness
 extents, procedure names, generic properties, etc) have had their
 metadata serialized out to thunks.
 @item All expressions correspond directly to VM instructions -- i.e.,
-there is no @code{<glil-local>} which can be a ref or a set.
+there is no @code{<glil-lexical>} which can be a ref or a set.
 @end itemize
 
 Assembly is isomorphic to the bytecode that it compiles to. You can
@@ -567,10 +587,11 @@ example:
 
 @example
 scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
-(load-program 0 0 0 0
+(load-program 0 0 0
   () ; Labels
-  60 ; Length
+  70 ; Length
   #f ; Metadata
+  (make-false)
   (make-false) ; object table for the returned lambda
   (nop)
   (nop) ; Alignment. Since assembly has already resolved its labels
@@ -578,11 +599,12 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 
'assembly)
   (nop) ; object code is mmap'd directly to structures, assembly
   (nop) ; has to have the alignment embedded in it.
   (nop) 
-  (load-program 1 0 0 0 
+  (load-program
+    1
+    0
     ()
-    6
-    ; This is the metadata thunk for the returned procedure.
-    (load-program 0 0 0 0 () 21 #f
+    8
+    (load-program 0 0 0 () 21 #f
       (load-symbol "x")  ; Name and liveness extent for @code{x}.
       (make-false)
       (make-int8:0) ; Some instruction+arg combinations
@@ -597,7 +619,9 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 
'assembly)
     (local-ref 0)
     (local-ref 0)
     (add)
-    (return))
+    (return)
+    (nop)
+    (nop))
   ; Return our new procedure.
   (return))
 @end example
@@ -618,10 +642,10 @@ the next step down from assembly:
 
 @example
 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly)
address@hidden (load-program 0 0 0 0 () 6 #f
address@hidden (load-program 0 0 0 () 6 #f
        (make-int8 32) (make-int8 10) (add) (return))
 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
address@hidden #u8(0 0 0 0 6 0 0 0 0 0 0 0 10 32 10 10 100 48)
address@hidden #u8(0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 10 32 10 10 120 52)
 @end example
 
 ``Objcode'' is bytecode, but mapped directly to a C structure,
@@ -631,8 +655,7 @@ scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
 struct scm_objcode @{
   scm_t_uint8 nargs;
   scm_t_uint8 nrest;
-  scm_t_uint8 nlocs;
-  scm_t_uint8 nexts;
+  scm_t_uint16 nlocs;
   scm_t_uint32 len;
   scm_t_uint32 metalen;
   scm_t_uint8 base[0];
@@ -642,7 +665,7 @@ struct scm_objcode @{
 As one might imagine, objcode imposes a minimum length on the
 bytecode. Also, the multibyte fields are in native endianness, which
 makes objcode (and bytecode) system-dependent. Indeed, in the short
-example above, all but the last 5 bytes were the program's header.
+example above, all but the last 6 bytes were the program's header.
 
 Objcode also has a couple of important efficiency hacks. First,
 objcode may be mapped directly from disk, allowing compiled code to be
@@ -672,7 +695,7 @@ Makes a bytecode object from @var{bytecode}, which should 
be a
 Load object code from a file named @var{file}. The file will be mapped
 into memory via @code{mmap}, so this is a very fast operation.
 
-On disk, object code has an eight-byte cookie prepended to it, to
+On disk, object code has an sixteen-byte cookie prepended to it, to
 prevent accidental loading of arbitrary garbage.
 @end deffn
 
@@ -689,11 +712,11 @@ Copy object code out to a @code{u8vector} for analysis by 
Scheme.
 The following procedure is actually in @code{(system vm program)}, but
 we'll mention it here:
 
address@hidden {Scheme Variable} make-program objcode objtable [external='()]
address@hidden {C Function} scm_make_program (objcode, objtable, external)
address@hidden {Scheme Variable} make-program objcode objtable [free-vars=#f]
address@hidden {C Function} scm_make_program (objcode, objtable, free_vars)
 Load up object code into a Scheme program. The resulting program will
 have @var{objtable} as its object table, which should be a vector or
address@hidden, and will capture the closure variables from @var{external}.
address@hidden, and will capture the free variables from @var{free-vars}.
 @end deffn
 
 Object code from a file may be disassembled at the REPL via the
@@ -707,9 +730,9 @@ respect to the compilation environment. Normally the 
environment
 propagates through the compiler transparently, but users may specify
 the compilation environment manually as well:
 
address@hidden {Scheme Procedure} make-objcode-env module externals
address@hidden {Scheme Procedure} make-objcode-env module free-vars
 Make an object code environment. @var{module} should be a Scheme
-module, and @var{externals} should be a list of external variables.
+module, and @var{free-vars} should be a vector of free variables.
 @code{#f} is also a valid object code environment.
 @end deffn
 
@@ -748,12 +771,14 @@ procedure is called a certain number of times.
 The name of the game is a profiling-based harvest of the low-hanging
 fruit, running programs of interest under a system-level profiler and
 determining which improvements would give the most bang for the buck.
-There are many well-known efficiency hacks in the literature: Dybvig's
-letrec optimization, individual boxing of heap-allocated values (and
-then store the boxes on the stack directly), optimized case-lambda
-expressions, stack underflow and overflow handlers, etc. Highly
-recommended papers: Dybvig's HOCS, Ghuloum's compiler paper.
+It's really getting to the point though that native compilation is the
+next step.
 
 The compiler also needs help at the top end, enhancing the Scheme that
-it knows to also understand R6RS, and adding new high-level compilers:
-Emacs Lisp, Lua, JavaScript...
+it knows to also understand R6RS, and adding new high-level compilers.
+We have JavaScript and Emacs Lisp mostly complete, but they could use
+some love; Lua would be nice as well, butq whatever language it is
+that strikes your fancy would be welcome too.
+
+Compilers are for hacking, not for admiring or for complaining about.
+Get to it!
diff --git a/doc/ref/vm.texi b/doc/ref/vm.texi
index fa65523..59798d8 100644
--- a/doc/ref/vm.texi
+++ b/doc/ref/vm.texi
@@ -13,8 +13,8 @@ procedures can call each other as they please.
 
 The difference is that the compiler creates and interprets bytecode
 for a custom virtual machine, instead of interpreting the
-S-expressions directly. Running compiled code is faster than running
-interpreted code.
+S-expressions directly. Loading and running compiled code is faster
+than loading and running source code.
 
 The virtual machine that does the bytecode interpretation is a part of
 Guile itself. This section describes the nature of Guile's virtual
@@ -134,7 +134,7 @@ compiled to object code, one might never leave the virtual 
machine.
 @subsection Stack Layout
 
 While not strictly necessary to understand how to work with the VM, it
-is instructive and sometimes entertaining to consider the struture of
+is instructive and sometimes entertaining to consider the structure of
 the VM stack.
 
 Logically speaking, a VM stack is composed of ``frames''. Each frame
@@ -159,12 +159,11 @@ The structure of the fixed part of an application frame 
is as follows:
 
 @example
              Stack
-   |                  | <- fp + bp->nargs + bp->nlocs + 4
+   |                  | <- fp + bp->nargs + bp->nlocs + 3
    +------------------+    = SCM_FRAME_UPPER_ADDRESS (fp)
    | Return address   |
    | MV return address|
-   | Dynamic link     |
-   | External link    | <- fp + bp->nargs + bp->nlocs
+   | Dynamic link     | <- fp + bp->nargs + bp->nlocs
    | Local variable 1 |    = SCM_FRAME_DATA_ADDRESS (fp)
    | Local variable 0 | <- fp + bp->nargs
    | Argument 1       |
@@ -201,25 +200,17 @@ values being returned.
 @item Dynamic link
 This is the @code{fp} in effect before this program was applied. In
 effect, this and the return address are the registers that are always
-``saved''.
-
address@hidden External link
-This field is a reference to the list of heap-allocated variables
-associated with this frame. For a discussion of heap versus stack
-allocation, @xref{Variables and the VM}.
+``saved''. The dynamic link links the current frame to the previous
+frame; computing a stack trace involves traversing these frames.
 
 @item Local variable @var{n}
-Lambda-local variables that are allocated on the stack are all
-allocated as part of the frame. This makes access to non-captured,
-non-mutated variables very cheap.
+Lambda-local variables that are all allocated as part of the frame.
+This makes access to variables very cheap.
 
 @item Argument @var{n}
 The calling convention of the VM requires arguments of a function
-application to be pushed on the stack, and here they are. Normally
-references to arguments dispatch to these locations on the stack.
-However if an argument has to be stored on the heap, it will be copied
-from its initial value here onto a location in the heap, and
-thereafter only referenced on the heap.
+application to be pushed on the stack, and here they are. References
+to arguments dispatch to these locations on the stack.
 
 @item Program
 This is the program being applied. For more information on how
@@ -236,26 +227,44 @@ Consider the following Scheme code as an example:
     (lambda (b) (list foo a b)))
 @end example
 
-Within the lambda expression, "foo" is a top-level variable, "a" is a
-lexically captured variable, and "b" is a local variable.
-
address@hidden may safely be allocated on the stack, as there is no enclosed
-procedure that references it, nor is it ever mutated.
-
address@hidden, on the other hand, is referenced by an enclosed procedure,
-that of the lambda. Thus it must be allocated on the heap, as it may
-(and will) outlive the dynamic extent of the invocation of @code{foo}.
-
address@hidden is a top-level variable, because it names the procedure
address@hidden, which is here defined at the top-level.
-
-Note that variables that are mutated (via @code{set!}) must be
-allocated on the heap, even if they are local variables. This is
-because any called subprocedure might capture the continuation, which
-would need to capture locations instead of values. Thus perhaps
-counterintuitively, what would seem ``closer to the metal'', viz
address@hidden, actually forces heap allocation instead of stack
-allocation.
+Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a
+lexically captured variable, and @code{b} is a local variable.
+
+Another way to refer to @code{a} and @code{b} is to say that @code{a}
+is a ``free'' variable, since it is not defined within the lambda, and
address@hidden is a ``bound'' variable. These are the terms used in the
address@hidden calculus}, a mathematical notation for describing
+functions. The lambda calculus is useful because it allows one to
+prove statements about functions. It is especially good at describing
+scope relations, and it is for that reason that we mention it here.
+
+Guile allocates all variables on the stack. When a lexically enclosed
+procedure with free variables---a @dfn{closure}---is created, it
+copies those variables its free variable vector. References to free
+variables are then redirected through the free variable vector.
+
+If a variable is ever @code{set!}, however, it will need to be
+heap-allocated instead of stack-allocated, so that different closures
+that capture the same variable can see the same value. Also, this
+allows continuations to capture a reference to the variable, instead
+of to its value at one point in time. For these reasons, @code{set!}
+variables are allocated in ``boxes''---actually, in variable cells.
address@hidden, for more information. References to @code{set!}
+variables are indirected through the boxes.
+
+Thus perhaps counterintuitively, what would seem ``closer to the
+metal'', viz @code{set!}, actually forces an extra memory allocation
+and indirection.
+
+Going back to our example, @code{b} may be allocated on the stack, as
+it is never mutated.
+
address@hidden may also be allocated on the stack, as it too is never
+mutated. Within the enclosed lambda, its value will be copied into
+(and referenced from) the free variables vector.
+
address@hidden is a top-level variable, because @code{foo} is not
+lexically bound in this example.
 
 @node VM Programs
 @subsection Compiled Procedures are VM Programs
@@ -297,27 +306,26 @@ scheme@@(guile-user)> (define (foo a) (lambda (b) (list 
foo a b)))
 scheme@@(guile-user)> ,x foo
 Disassembly of #<program foo (a)>:
 
-   0    (local-ref 0)                   ;; `a' (arg)
-   2    (external-set 0)                ;; `a' (arg)
-   4    (object-ref 1)                  ;; #<program b70d2910 at <unknown 
port>:0:16 (b)>
-   6    (make-closure)                  
-   7    (return)                        
+   0    (object-ref 1)                  ;; #<program b7e478b0 at <unknown 
port>:0:16 (b)>
+   2    (local-ref 0)                   ;; `a' (arg)
+   4    (vector 0 1)                    ;; 1 element
+   7    (make-closure)                  
+   8    (return)                        
 
 ----------------------------------------
-Disassembly of #<program b70d2910 at <unknown port>:0:16 (b)>:
+Disassembly of #<program b7e478b0 at <unknown port>:0:16 (b)>:
 
    0    (toplevel-ref 1)                ;; `foo'
-   2    (external-ref 0)                ;; (closure variable)
+   2    (free-ref 0)                    ;; (closure variable)
    4    (local-ref 0)                   ;; `b' (arg)
    6    (list 0 3)                      ;; 3 elements         at (unknown 
file):0:28
    9    (return)                        
 @end smallexample
 
-At @code{ip} 0 and 2, we do the copy from argument to heap for
address@hidden @code{Ip} 4 loads up the compiled lambda, and then at
address@hidden 6 we make a closure---binding code (from the compiled
-lambda) with data (the heap-allocated variables). Finally we return
-the closure.
+At @code{ip} 0, we load up the compiled lambda. @code{Ip} 2 and 4
+create the free variables vector, and @code{ip} 7 makes the
+closure---binding code (from the compiled lambda) with data (the
+free-variable vector). Finally we return the closure.
 
 The second stanza disassembles the compiled lambda. Toplevel variables
 are resolved relative to the module that was current when the
@@ -336,7 +344,7 @@ routine.
 @node Instruction Set
 @subsection Instruction Set
 
-There are about 100 instructions in Guile's virtual machine. These
+There are about 150 instructions in Guile's virtual machine. These
 instructions represent atomic units of a program's execution. Ideally,
 they perform one task without conditional branches, then dispatch to
 the next instruction in the stream.
@@ -376,16 +384,22 @@ instructions. More instructions may be added over time.
 * Miscellaneous Instructions::  
 * Inlined Scheme Instructions::  
 * Inlined Mathematical Instructions::  
+* Inlined Bytevector Instructions::  
 @end menu
 
 @node Environment Control Instructions
 @subsubsection Environment Control Instructions
 
 These instructions access and mutate the environment of a compiled
-procedure---the local bindings, the ``external'' bindings, and the
+procedure---the local bindings, the free (captured) bindings, and the
 toplevel bindings.
 
+Some of these instructions have @code{long-} variants, the difference
+being that they take 16-bit arguments, encoded in big-endianness,
+instead of the normal 8-bit range.
+
 @deffn Instruction local-ref index
address@hidden Instruction long-local-ref index
 Push onto the stack the value of the local variable located at
 @var{index} within the current stack frame.
 
@@ -395,26 +409,62 @@ arguments.
 @end deffn
 
 @deffn Instruction local-set index
address@hidden Instruction long-local-ref index
 Pop the Scheme object located on top of the stack and make it the new
 value of the local variable located at @var{index} within the current
 stack frame.
 @end deffn
 
address@hidden Instruction external-ref index
-Push the value of the closure variable located at position
address@hidden within the program's list of external variables.
address@hidden Instruction free-ref index
+Push the value of the captured variable located at position
address@hidden within the program's vector of captured variables.
 @end deffn
 
address@hidden Instruction external-set index
-Pop the Scheme object located on top of the stack and make it the new
-value of the closure variable located at @var{index} within the
-program's list of external variables.
address@hidden Instruction free-boxed-ref index
address@hidden Instruction free-boxed-set index
+Get or set a boxed free variable. Note that there is no free-set
+instruction, as variables that are @code{set!} must be boxed.
+
+These instructions assume that the value at position @var{index} in
+the free variables vector is a variable.
 @end deffn
 
-The external variable lookup algorithm should probably be made more
-efficient in the future via addressing by frame and index. Currently,
-external variables are all consed onto a list, which results in O(N)
-lookup time.
address@hidden Instruction make-closure
+Pop a vector and a program object off the stack, in that order, and
+push a new program object with the given free variables vector. The
+new program object shares state with the original program.
+
+At the time of this writing, the space overhead of closures is 4 words
+per closure.
address@hidden deffn
+
address@hidden Instruction fix-closure index
+Pop a vector off the stack, and set it as the @var{index}th local
+variable's free variable vector. The @var{index}th local variable is
+assumed to be a procedure.
+
+This instruction is part of a hack for allocating mutually recursive
+procedures. The hack is to first perform a @code{local-set} for all of
+the recursive procedures, then fix up the procedures' free variable
+bindings in place. This allows most @code{letrec}-bound procedures to
+be allocated unboxed on the stack.
+
+One could of course do a @code{local-ref}, then @code{make-closure},
+then @code{local-set}, but this macroinstruction helps to speed up the
+common case.
address@hidden deffn
+
address@hidden Instruction box index
+Pop a value off the stack, and set the @var{index}nth local variable
+to a box containing that value. A shortcut for @code{make-variable}
+then @code{local-set}, used when binding boxed variables.
address@hidden deffn
+
address@hidden Instruction empty-box index
+Set the @var{indext}h local variable to a box containing a variable
+whose value is unbound. Used when compiling some @code{letrec}
+expressions.
address@hidden deffn
 
 @deffn Instruction toplevel-ref index
 @deffnx Instruction long-toplevel-ref index
@@ -442,9 +492,6 @@ in-place mutation of the object table. This mechanism 
provides for
 lazy variable resolution, and an important cached fast-path once the
 variable has been successfully resolved.
 
-The ``long'' variant has a 16-bit index instead of an 8-bit index,
-with the most significant byte first.
-
 This instruction pushes the value of the variable onto the stack.
 @end deffn
 
@@ -453,8 +500,13 @@ This instruction pushes the value of the variable onto the 
stack.
 Pop a value off the stack, and set it as the value of the toplevel
 variable stored at @var{index} in the object table. If the variable
 has not yet been looked up, we do the lookup as in
address@hidden The ``long'' variant has a 16-bit index instead
-of an 8-bit index.
address@hidden
address@hidden deffn
+
address@hidden Instruction define
+Pop a symbol and a value from the stack, in that order. Look up its
+binding in the current toplevel environment, creating the binding if
+necessary. Set the variable to the value.
 @end deffn
 
 @deffn Instruction link-now
@@ -476,6 +528,11 @@ Pop off two objects from the stack, a variable and a 
value, and set
 the variable to the value.
 @end deffn
 
address@hidden Instruction make-variable
+Replace the top object on the stack with a variable containing it.
+Used in some circumstances when compiling @code{letrec} expressions.
address@hidden deffn
+
 @deffn Instruction object-ref n
 @deffnx Instruction long-object-ref n
 Push @var{n}th value from the current program's object vector. The
@@ -499,7 +556,10 @@ the one to which the instruction pointer points).
 @end itemize
 
 Note that the offset passed to the instruction is encoded on two 8-bit
-integers which are then combined by the VM as one 16-bit integer.
+integers which are then combined by the VM as one 16-bit integer. Note
+also that jump targets in Guile are aligned on 8-byte boundaries, and
+that the offset refers to the @var{n}th 8-byte boundary, effectively
+giving Guile a 19-bit relative address space.
 
 @deffn Instruction br offset
 Jump to @var{offset}.
@@ -550,19 +610,21 @@ Load an arbitrary number from the instruction stream. The 
number is
 embedded in the stream as a string.
 @end deffn
 @deffn Instruction load-string length
-Load a string from the instruction stream.
+Load a string from the instruction stream. The string is assumed to be
+encoded in the ``latin1'' locale.
 @end deffn
address@hidden Instruction load-symbol length
-Load a symbol from the instruction stream.
address@hidden Instruction load-wide-string length
+Load a UTF-32 string from the instruction stream. @var{length} is the
+length in bytes, not in codepoints
 @end deffn
address@hidden Instruction load-keyword length
-Load a keyword from the instruction stream.
address@hidden Instruction load-symbol length
+Load a symbol from the instruction stream. The symbol is assumed to be
+encoded in the ``latin1'' locale. Symbols backed by wide strings may
+be loaded via @code{load-wide-string} then @code{make-symbol}.
 @end deffn
-
address@hidden Instruction define length
-Load a symbol from the instruction stream, and look up its binding in
-the current toplevel environment, creating the binding if necessary.
-Push the variable corresponding to the binding.
address@hidden Instruction load-array length
+Load a uniform array from the instruction stream. The shape and type
+of the array are popped off the stack, in that order.
 @end deffn
 
 @deffn Instruction load-program
@@ -579,23 +641,9 @@ because instead of parsing its data, it directly maps the 
instruction
 stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
 and Objcode}, for more information.
 
-The resulting compiled procedure will not have any ``external''
-variables captured, so it may be loaded only once but used many times
-to create closures.
address@hidden deffn
-
-Finally, while this instruction is not strictly a ``loading''
-instruction, it's useful to wind up the @code{load-program} discussion
-here:
-
address@hidden Instruction make-closure
-Pop the program object from the stack, capture the current set of
-``external'' variables, and assign those external variables to a copy
-of the program. Push the new program object, which shares state with
-the original program.
-
-At the time of this writing, the space overhead of closures is 4 words
-per closure.
+The resulting compiled procedure will not have any free variables
+captured, so it may be loaded only once but used many times to create
+closures.
 @end deffn
 
 @node Procedural Instructions
@@ -764,6 +812,19 @@ Push @code{'()} onto the stack.
 Push @var{value}, an 8-bit character, onto the stack.
 @end deffn
 
address@hidden Instruction make-char32 value
+Push @var{value}, an 32-bit character, onto the stack. The value is
+encoded in big-endian order.
address@hidden deffn
+
address@hidden Instruction make-symbol
+Pops a string off the stack, and pushes a symbol.
address@hidden deffn
+
address@hidden Instruction make-keyword value
+Pops a symbol off the stack, and pushes a keyword.
address@hidden deffn
+
 @deffn Instruction list n
 Pops off the top @var{n} values off of the stack, consing them up into
 a list, then pushes that list on the stack. What was the topmost value
@@ -807,7 +868,8 @@ pushes its elements on the stack.
 @subsubsection Miscellaneous Instructions
 
 @deffn Instruction nop
-Does nothing!
+Does nothing! Used for padding other instructions to certain
+alignments.
 @end deffn
 
 @deffn Instruction halt
@@ -873,6 +935,8 @@ stream.
 @deffnx Instruction cons x y
 @deffnx Instruction car x
 @deffnx Instruction cdr x
address@hidden Instruction vector-ref x y
address@hidden Instruction vector-set x n y
 Inlined implementations of their Scheme equivalents.
 @end deffn
 
@@ -893,7 +957,9 @@ As in the previous section, the definitions below show stack
 parameters instead of instruction stream parameters.
 
 @deffn Instruction add x y
address@hidden Instruction add1 x
 @deffnx Instruction sub x y
address@hidden Instruction sub1 x
 @deffnx Instruction mul x y
 @deffnx Instruction div x y
 @deffnx Instruction quo x y
@@ -906,3 +972,58 @@ parameters instead of instruction stream parameters.
 @deffnx Instruction ge? x y
 Inlined implementations of the corresponding mathematical operations.
 @end deffn
+
address@hidden Inlined Bytevector Instructions
address@hidden Inlined Bytevector Instructions
+
+Bytevector operations correspond closely to what the current hardware
+can do, so it makes sense to inline them to VM instructions, providing
+a clear path for eventual native compilation. Without this, Scheme
+programs would need other primitives for accessing raw bytes -- but
+these primitives are as good as any.
+
+As in the previous section, the definitions below show stack
+parameters instead of instruction stream parameters.
+
+The multibyte formats (@code{u16}, @code{f64}, etc) take an extra
+endianness argument. Only aligned native accesses are currently
+fast-pathed in Guile's VM.
+
address@hidden Instruction bv-u8-ref bv n
address@hidden Instruction bv-s8-ref bv n
address@hidden Instruction bv-u16-native-ref bv n
address@hidden Instruction bv-s16-native-ref bv n
address@hidden Instruction bv-u32-native-ref bv n
address@hidden Instruction bv-s32-native-ref bv n
address@hidden Instruction bv-u64-native-ref bv n
address@hidden Instruction bv-s64-native-ref bv n
address@hidden Instruction bv-f32-native-ref bv n
address@hidden Instruction bv-f64-native-ref bv n
address@hidden Instruction bv-u16-ref bv n endianness
address@hidden Instruction bv-s16-ref bv n endianness
address@hidden Instruction bv-u32-ref bv n endianness
address@hidden Instruction bv-s32-ref bv n endianness
address@hidden Instruction bv-u64-ref bv n endianness
address@hidden Instruction bv-s64-ref bv n endianness
address@hidden Instruction bv-f32-ref bv n endianness
address@hidden Instruction bv-f64-ref bv n endianness
address@hidden Instruction bv-u8-set bv n val
address@hidden Instruction bv-s8-set bv n val
address@hidden Instruction bv-u16-native-set bv n val
address@hidden Instruction bv-s16-native-set bv n val
address@hidden Instruction bv-u32-native-set bv n val
address@hidden Instruction bv-s32-native-set bv n val
address@hidden Instruction bv-u64-native-set bv n val
address@hidden Instruction bv-s64-native-set bv n val
address@hidden Instruction bv-f32-native-set bv n val
address@hidden Instruction bv-f64-native-set bv n val
address@hidden Instruction bv-u16-set bv n val endianness
address@hidden Instruction bv-s16-set bv n val endianness
address@hidden Instruction bv-u32-set bv n val endianness
address@hidden Instruction bv-s32-set bv n val endianness
address@hidden Instruction bv-u64-set bv n val endianness
address@hidden Instruction bv-s64-set bv n val endianness
address@hidden Instruction bv-f32-set bv n val endianness
address@hidden Instruction bv-f64-set bv n val endianness
+Inlined implementations of the corresponding bytevector operations.
address@hidden deffn
diff --git a/module/language/assembly/disassemble.scm 
b/module/language/assembly/disassemble.scm
index d41c816..492acb7 100644
--- a/module/language/assembly/disassemble.scm
+++ b/module/language/assembly/disassemble.scm
@@ -60,6 +60,8 @@
                   (print-info pos `(load-program ,sym) #f #f)
                   (lp (+ pos (byte-length asm)) (cdr code)
                       (acons sym asm programs))))
+               ((nop)
+                (lp (+ pos (byte-length asm)) (cdr code) programs))
                (else
                 (print-info pos asm
                             (code-annotation end asm objs nargs blocs


hooks/post-receive
-- 
GNU Guile




reply via email to

[Prev in Thread] Current Thread [Next in Thread]