guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, string_abstraction2, updated. release_


From: Michael Gran
Subject: [Guile-commits] GNU Guile branch, string_abstraction2, updated. release_1-9-1-149-g830e63f
Date: Thu, 13 Aug 2009 04:10:49 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=830e63fab1e54d94d5d5a5a4a1c7babe0a2530b9

The branch, string_abstraction2 has been updated
       via  830e63fab1e54d94d5d5a5a4a1c7babe0a2530b9 (commit)
       via  63f9ba3ac9a438b6e0d52ef2fb74dd4a07fa9cde (commit)
       via  c5c142414f577ac6e7b6921c0e614f7ef9b16e44 (commit)
       via  8fbc29b2311422e1230b665233a49d898393347a (commit)
       via  0e7e26117aa3aec918cc7333945dafbbea801e64 (commit)
      from  ca2858c6ba827675af1e348b740b868abc1fdc8d (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 830e63fab1e54d94d5d5a5a4a1c7babe0a2530b9
Author: Andy Wingo <address@hidden>
Date:   Wed Aug 12 23:38:05 2009 +0200

    update docs for recent vm/compiler work
    
    * doc/ref/compiler.texi:
    * doc/ref/vm.texi: Update for recent changes.
    * module/language/assembly/disassemble.scm (disassemble-load-program):
      Don't print nops, they are distracting.

commit 63f9ba3ac9a438b6e0d52ef2fb74dd4a07fa9cde
Author: Andy Wingo <address@hidden>
Date:   Wed Aug 12 20:44:30 2009 +0200

    "fix" <let>-bound lambda expressions too
    
    * module/language/tree-il/compile-glil.scm (compile-glil): Compute
      warnings before optimizing, as unreferenced variables will be
      optimized out.
    
    * libguile/_scm.h: Fix C99 comment.
    
    * module/language/tree-il/fix-letrec.scm (partition-vars): Also analyze
      let-bound vars.
      (fix-letrec!): Fix a bug whereby a set! to an unreffed var would be
      called for value, not effect. Also "fix" <let>-bound lambda
      expressions -- really speeds up pmatch.
    
    * test-suite/tests/tree-il.test ("lexical sets", "the or hack"): Update
      to take into account the new optimizations.

commit c5c142414f577ac6e7b6921c0e614f7ef9b16e44
Author: Michael Gran <address@hidden>
Date:   Wed Aug 12 09:21:37 2009 -0700

    Regression, scm_string fails to test for circular lists
    
    * libguile/string.c (scm_string): Restores the functionality
      where scm_string tests for circular lists
    
    * test-suite/tests/strings.test: add test for circular lists

commit 8fbc29b2311422e1230b665233a49d898393347a
Author: Michael Gran <address@hidden>
Date:   Wed Aug 12 08:50:12 2009 -0700

    Some signed/unsigned comparison and conversions
    
    * libguile/ports.c (scm_lfwrite_str, scm_lfwrite_substr): signed/unsigned
      conversion and comparison
    
    * libguile/strings.c (scm_string_append): signed/unsigned comparison

commit 0e7e26117aa3aec918cc7333945dafbbea801e64
Author: Michael Gran <address@hidden>
Date:   Wed Aug 12 20:54:12 2009 -0700

    rework the vm support for wide strings
    
    * libguile/_scm.h (SCM_OBJCODE_MINOR_VERSION): Bump.
    
    * libguile/vm-engine.c (vm_error_bad_wide_string_length): New error
      case.
    
    * libguile/vm-i-loader.c (load-unsigned-integer, load-integer)
      (load-keyword): Remove these instructions. The former two are
      obsoleted by make-int64/make-uint64, the latter via make-keyword.
      (load-string): Only handle narrow strings.
      (load-symbol): Only handle narrow symbols. The wide case is handled
      via make-symbol.
      (load-wide-string): New instruction, for wide strings.
    
    * libguile/vm-i-system.c (define): Move here from loaders.c, as now it
      just takes a sym on the stack.
      (make-keyword, make-symbol): New instructions.
    
    * module/language/assembly.scm: Remove removed instructions. No more
      width byte in load-string etc.
    
    * module/language/assembly/compile-bytecode.scm (write-bytecode): Adapt
      to change in instruction set.
    
    * module/language/glil/compile-assembly.scm (glil->assembly): Compile
      define by pushing the sym then emitting (define).
      (dump-object): Dump narrow and wide strings differently. Use
      make-keyword and make-symbol as appropriate.
    
    * module/language/tree-il/compile-glil.scm (flatten): When compiling a
      ref to a primitive (not a call), first see if the primitive is
      actually bound in the root module. (That's not the case with e.g.
      bytevector-u8-ref).
    
    * module/system/xref.scm (program-callee-rev-vars): Don't parse out
      "nexts".
    
    * test-suite/tests/asm-to-bytecode.test ("compiler"): Adapt to bytecode
      format change.
    
    Conflicts:
    
        libguile/_scm.h

-----------------------------------------------------------------------

Summary of changes:
 doc/ref/compiler.texi                           |  165 +++++++-----
 doc/ref/vm.texi                                 |  319 ++++++++++++++++-------
 libguile/_scm.h                                 |   30 +++
 libguile/ports.c                                |    4 +-
 libguile/strings.c                              |    5 +-
 libguile/vm-engine.c                            |    4 +
 libguile/vm-i-loader.c                          |  153 ++---------
 libguile/vm-i-system.c                          |   28 ++
 module/language/assembly.scm                    |   18 +-
 module/language/assembly/compile-bytecode.scm   |   19 +-
 module/language/assembly/decompile-bytecode.scm |   27 +--
 module/language/assembly/disassemble.scm        |    2 +
 module/language/glil/compile-assembly.scm       |   19 +-
 module/language/tree-il/compile-glil.scm        |   23 +-
 module/language/tree-il/fix-letrec.scm          |   62 +++++-
 module/system/xref.scm                          |    2 +-
 test-suite/tests/asm-to-bytecode.test           |   17 +-
 test-suite/tests/strings.test                   |   12 +-
 test-suite/tests/tree-il.test                   |   25 ++-
 19 files changed, 545 insertions(+), 389 deletions(-)

diff --git a/doc/ref/compiler.texi b/doc/ref/compiler.texi
index f8d0895..0aea4e7 100644
--- a/doc/ref/compiler.texi
+++ b/doc/ref/compiler.texi
@@ -17,7 +17,7 @@ This section aims to pay attention to the small man behind the
 curtain.
 
 @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
-know how to compile your .scm file.
+know how to compile your @code{.scm} file.
 
 @menu
 * Compiler Tower::                   
@@ -67,8 +67,7 @@ for Scheme:
   #:title       "Guile Scheme"
   #:version     "0.5"
   #:reader      read
-  #:compilers   `((tree-il . ,compile-tree-il)
-                  (ghil . ,compile-ghil))
+  #:compilers   `((tree-il . ,compile-tree-il))
   #:decompilers `((tree-il . ,decompile-tree-il))
   #:evaluator   (lambda (x module) (primitive-eval x))
   #:printer     write)
@@ -220,13 +219,13 @@ Note however that @code{sc-expand} does not have the same 
signature as
 around @code{sc-expand}, to make it conform to the general form of
 compiler procedures in Guile's language tower.
 
-Compiler procedures take two arguments, an expression and an
-environment. They return three values: the compiled expression, the
-corresponding environment for the target language, and a
-``continuation environment''. The compiled expression and environment
-will serve as input to the next language's compiler. The
-``continuation environment'' can be used to compile another expression
-from the same source language within the same module.
+Compiler procedures take three arguments: an expression, an
+environment, and a keyword list of options. They return three values:
+the compiled expression, the corresponding environment for the target
+language, and a ``continuation environment''. The compiled expression
+and environment will serve as input to the next language's compiler.
+The ``continuation environment'' can be used to compile another
+expression from the same source language within the same module.
 
 For example, you might compile the expression, @code{(define-module
 (foo))}. This will result in a Tree-IL expression and environment. But
@@ -292,6 +291,14 @@ tree-il@@(guile-user)> (apply (primitive +) (const 32) 
(const 10))
 
 The @code{src} fields are left out of the external representation.
 
+One may create Tree-IL objects from their external representations via
+calling @code{parse-tree-il}, the reader for Tree-IL. If any source
+information is attached to the input S-expression, it will be
+propagated to the resulting Tree-IL expressions. This is probably the
+easiest way to compile to Tree-IL: just make the appropriate external
+representations in S-expression format, and let @code{parse-tree-il}
+take care of the rest.
+
 @deftp {Scheme Variable} <void> src
 @deftpx {External Representation} (void)
 An empty expression. In practice, equivalent to Scheme's @code{(if #f
@@ -384,12 +391,29 @@ A version of @code{<let>} that creates recursive 
bindings, like
 Scheme's @code{letrec}.
 @end deftp
 
address@hidden FIXME -- need to revive this one
address@hidden @deftp {Scheme Variable} <ghil-mv-bind> src vars rest producer . 
body
address@hidden Like Scheme's @code{receive} -- binds the values returned by
address@hidden applying @code{producer}, which should be a thunk, to the
address@hidden @code{lambda}-like bindings described by @var{vars} and 
@var{rest}.
address@hidden @end deftp
+There are two Tree-IL constructs that are not normally produced by
+higher-level compilers, but instead are generated during the
+source-to-source optimization and analysis passes that the Tree-IL
+compiler does. Users should not generate these expressions directly,
+unless they feel very clever, as the default analysis pass will
+generate them as necessary.
+
address@hidden {Scheme Variable} <let-values> src names vars exp body
address@hidden {External Representation} (let-values @var{names} @var{vars} 
@var{exp} @var{body})
+Like Scheme's @code{receive} -- binds the values returned by
+evaluating @code{exp} to the @code{lambda}-like bindings described by
address@hidden That is to say, @var{vars} may be an improper list.
+
address@hidden<let-values>} is an optimization of @code{<application>} of the
+primitive, @code{call-with-values}.
address@hidden deftp
address@hidden {Scheme Variable} <fix> src names vars vals body
address@hidden {External Representation} (fix @var{names} @var{vars} @var{vals} 
@var{body})
+Like @code{<letrec>}, but only for @var{vals} that are unset
address@hidden expressions.
+
address@hidden is an optimization of @code{letrec} (and @code{let}).
address@hidden deftp
 
 Tree-IL implements a compiler to GLIL that recursively traverses
 Tree-IL expressions, writing out GLIL expressions into a linear list.
@@ -399,9 +423,9 @@ future computations. This state allows the compiler not to 
emit code
 for constant expressions that will not be used (e.g. docstrings), and
 to perform tail calls when in tail position.
 
-In the future, there will be a pass at the beginning of the
-Tree-IL->GLIL compilation step to perform inlining, copy propagation,
-dead code elimination, and constant folding.
+Most optimization, such as it currently is, is performed on Tree-IL
+expressions as source-to-source transformations. There will be more
+optimizations added in the future.
 
 Interested readers are encouraged to read the implementation in
 @code{(language tree-il compile-glil)} for more details.
@@ -411,18 +435,16 @@ Interested readers are encouraged to read the 
implementation in
 
 Guile Low Intermediate Language (GLIL) is a structured intermediate
 language whose expressions more closely approximate Guile's VM
-instruction set.
+instruction set. Its expression types are defined in @code{(language
+glil)}.
 
-Its expression types are defined in @code{(language glil)}, and as
-with GHIL, some of its fields parse as rest arguments.
-
address@hidden {Scheme Variable} <glil-program> nargs nrest nlocs nexts meta . 
body
address@hidden {Scheme Variable} <glil-program> nargs nrest nlocs meta . body
 A unit of code that at run-time will correspond to a compiled
-procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts}
-collectively define the program's arity; see @ref{Compiled
-Procedures}, for more information. @var{meta} should be an alist of
-properties, as in Tree IL's @code{<lambda>}. @var{body} is a list of
-GLIL expressions.
+procedure. @var{nargs} @var{nrest} and @var{nlocs} collectively define
+the program's arity; see @ref{Compiled Procedures}, for more
+information. @var{meta} should be an alist of properties, as in
+Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
+expressions.
 @end deftp
 @deftp {Scheme Variable} <glil-bind> . vars
 An advisory expression that notes a liveness extent for a set of
@@ -461,23 +483,21 @@ and @code{filename} keys, e.g. as returned by
 @code{source-properties}.
 @end deftp
 @deftp {Scheme Variable} <glil-void>
-Pushes the unspecified value on the stack.
+Pushes ``the unspecified value'' on the stack.
 @end deftp
 @deftp {Scheme Variable} <glil-const> obj
 Pushes a constant value onto the stack. @var{obj} must be a number,
-string, symbol, keyword, boolean, character, the empty list, or a pair
-or vector of constants.
address@hidden deftp
address@hidden {Scheme Variable} <glil-local> op index
-Accesses a lexically bound variable from the stack. If @var{op} is
address@hidden, the value is pushed onto the stack; if it is @code{set},
-the variable is set from the top value on the stack, which is popped
-off. @xref{Stack Layout}, for more information.
+string, symbol, keyword, boolean, character, uniform array, the empty
+list, or a pair or vector of constants.
 @end deftp
address@hidden {Scheme Variable} <glil-external> op depth index
-Accesses a heap-allocated variable, addressed by @var{depth}, the nth
-enclosing environment, and @var{index}, the variable's position within
-the environment. @var{op} is @code{ref} or @code{set}.
address@hidden {Scheme Variable} <glil-lexical> local? boxed? op index
+Accesses a lexically bound variable. If the variable is not
address@hidden it is free. All variables may have @code{ref} and
address@hidden as their @var{op}. Boxed variables may also have the
address@hidden @code{box}, @code{empty-box}, and @code{fix}, which
+correspond in semantics to the VM instructions @code{box},
address@hidden, and @code{fix-closure}. @xref{Stack Layout}, for
+more information.
 @end deftp
 @deftp {Scheme Variable} <glil-toplevel> op name
 Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
@@ -520,7 +540,7 @@ Guile Lowlevel Intermediate Language (GLIL) interpreter 0.3 
on Guile 1.9.0
 Copyright (C) 2001-2008 Free Software Foundation, Inc.
 
 Enter `,help' for help.
-glil@@(guile-user)> (program 0 0 0 0 () (const 3) (call return 0))
+glil@@(guile-user)> (program 0 0 0 () (const 3) (call return 1))
 @result{} 3
 @end example
 
@@ -542,12 +562,12 @@ differs from GLIL in four main ways:
 @itemize
 @item Labels have been resolved to byte offsets in the program.
 @item Constants inside procedures have either been expressed as inline
-instructions, and possibly cached in object arrays.
+instructions or cached in object arrays.
 @item Procedures with metadata (source location information, liveness
 extents, procedure names, generic properties, etc) have had their
 metadata serialized out to thunks.
 @item All expressions correspond directly to VM instructions -- i.e.,
-there is no @code{<glil-local>} which can be a ref or a set.
+there is no @code{<glil-lexical>} which can be a ref or a set.
 @end itemize
 
 Assembly is isomorphic to the bytecode that it compiles to. You can
@@ -567,10 +587,11 @@ example:
 
 @example
 scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
-(load-program 0 0 0 0
+(load-program 0 0 0
   () ; Labels
-  60 ; Length
+  70 ; Length
   #f ; Metadata
+  (make-false)
   (make-false) ; object table for the returned lambda
   (nop)
   (nop) ; Alignment. Since assembly has already resolved its labels
@@ -578,11 +599,12 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 
'assembly)
   (nop) ; object code is mmap'd directly to structures, assembly
   (nop) ; has to have the alignment embedded in it.
   (nop) 
-  (load-program 1 0 0 0 
+  (load-program
+    1
+    0
     ()
-    6
-    ; This is the metadata thunk for the returned procedure.
-    (load-program 0 0 0 0 () 21 #f
+    8
+    (load-program 0 0 0 () 21 #f
       (load-symbol "x")  ; Name and liveness extent for @code{x}.
       (make-false)
       (make-int8:0) ; Some instruction+arg combinations
@@ -597,7 +619,9 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 
'assembly)
     (local-ref 0)
     (local-ref 0)
     (add)
-    (return))
+    (return)
+    (nop)
+    (nop))
   ; Return our new procedure.
   (return))
 @end example
@@ -618,10 +642,10 @@ the next step down from assembly:
 
 @example
 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly)
address@hidden (load-program 0 0 0 0 () 6 #f
address@hidden (load-program 0 0 0 () 6 #f
        (make-int8 32) (make-int8 10) (add) (return))
 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
address@hidden #u8(0 0 0 0 6 0 0 0 0 0 0 0 10 32 10 10 100 48)
address@hidden #u8(0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 10 32 10 10 120 52)
 @end example
 
 ``Objcode'' is bytecode, but mapped directly to a C structure,
@@ -631,8 +655,7 @@ scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
 struct scm_objcode @{
   scm_t_uint8 nargs;
   scm_t_uint8 nrest;
-  scm_t_uint8 nlocs;
-  scm_t_uint8 nexts;
+  scm_t_uint16 nlocs;
   scm_t_uint32 len;
   scm_t_uint32 metalen;
   scm_t_uint8 base[0];
@@ -642,7 +665,7 @@ struct scm_objcode @{
 As one might imagine, objcode imposes a minimum length on the
 bytecode. Also, the multibyte fields are in native endianness, which
 makes objcode (and bytecode) system-dependent. Indeed, in the short
-example above, all but the last 5 bytes were the program's header.
+example above, all but the last 6 bytes were the program's header.
 
 Objcode also has a couple of important efficiency hacks. First,
 objcode may be mapped directly from disk, allowing compiled code to be
@@ -672,7 +695,7 @@ Makes a bytecode object from @var{bytecode}, which should 
be a
 Load object code from a file named @var{file}. The file will be mapped
 into memory via @code{mmap}, so this is a very fast operation.
 
-On disk, object code has an eight-byte cookie prepended to it, to
+On disk, object code has an sixteen-byte cookie prepended to it, to
 prevent accidental loading of arbitrary garbage.
 @end deffn
 
@@ -689,11 +712,11 @@ Copy object code out to a @code{u8vector} for analysis by 
Scheme.
 The following procedure is actually in @code{(system vm program)}, but
 we'll mention it here:
 
address@hidden {Scheme Variable} make-program objcode objtable [external='()]
address@hidden {C Function} scm_make_program (objcode, objtable, external)
address@hidden {Scheme Variable} make-program objcode objtable [free-vars=#f]
address@hidden {C Function} scm_make_program (objcode, objtable, free_vars)
 Load up object code into a Scheme program. The resulting program will
 have @var{objtable} as its object table, which should be a vector or
address@hidden, and will capture the closure variables from @var{external}.
address@hidden, and will capture the free variables from @var{free-vars}.
 @end deffn
 
 Object code from a file may be disassembled at the REPL via the
@@ -707,9 +730,9 @@ respect to the compilation environment. Normally the 
environment
 propagates through the compiler transparently, but users may specify
 the compilation environment manually as well:
 
address@hidden {Scheme Procedure} make-objcode-env module externals
address@hidden {Scheme Procedure} make-objcode-env module free-vars
 Make an object code environment. @var{module} should be a Scheme
-module, and @var{externals} should be a list of external variables.
+module, and @var{free-vars} should be a vector of free variables.
 @code{#f} is also a valid object code environment.
 @end deffn
 
@@ -748,12 +771,14 @@ procedure is called a certain number of times.
 The name of the game is a profiling-based harvest of the low-hanging
 fruit, running programs of interest under a system-level profiler and
 determining which improvements would give the most bang for the buck.
-There are many well-known efficiency hacks in the literature: Dybvig's
-letrec optimization, individual boxing of heap-allocated values (and
-then store the boxes on the stack directly), optimized case-lambda
-expressions, stack underflow and overflow handlers, etc. Highly
-recommended papers: Dybvig's HOCS, Ghuloum's compiler paper.
+It's really getting to the point though that native compilation is the
+next step.
 
 The compiler also needs help at the top end, enhancing the Scheme that
-it knows to also understand R6RS, and adding new high-level compilers:
-Emacs Lisp, Lua, JavaScript...
+it knows to also understand R6RS, and adding new high-level compilers.
+We have JavaScript and Emacs Lisp mostly complete, but they could use
+some love; Lua would be nice as well, butq whatever language it is
+that strikes your fancy would be welcome too.
+
+Compilers are for hacking, not for admiring or for complaining about.
+Get to it!
diff --git a/doc/ref/vm.texi b/doc/ref/vm.texi
index fa65523..59798d8 100644
--- a/doc/ref/vm.texi
+++ b/doc/ref/vm.texi
@@ -13,8 +13,8 @@ procedures can call each other as they please.
 
 The difference is that the compiler creates and interprets bytecode
 for a custom virtual machine, instead of interpreting the
-S-expressions directly. Running compiled code is faster than running
-interpreted code.
+S-expressions directly. Loading and running compiled code is faster
+than loading and running source code.
 
 The virtual machine that does the bytecode interpretation is a part of
 Guile itself. This section describes the nature of Guile's virtual
@@ -134,7 +134,7 @@ compiled to object code, one might never leave the virtual 
machine.
 @subsection Stack Layout
 
 While not strictly necessary to understand how to work with the VM, it
-is instructive and sometimes entertaining to consider the struture of
+is instructive and sometimes entertaining to consider the structure of
 the VM stack.
 
 Logically speaking, a VM stack is composed of ``frames''. Each frame
@@ -159,12 +159,11 @@ The structure of the fixed part of an application frame 
is as follows:
 
 @example
              Stack
-   |                  | <- fp + bp->nargs + bp->nlocs + 4
+   |                  | <- fp + bp->nargs + bp->nlocs + 3
    +------------------+    = SCM_FRAME_UPPER_ADDRESS (fp)
    | Return address   |
    | MV return address|
-   | Dynamic link     |
-   | External link    | <- fp + bp->nargs + bp->nlocs
+   | Dynamic link     | <- fp + bp->nargs + bp->nlocs
    | Local variable 1 |    = SCM_FRAME_DATA_ADDRESS (fp)
    | Local variable 0 | <- fp + bp->nargs
    | Argument 1       |
@@ -201,25 +200,17 @@ values being returned.
 @item Dynamic link
 This is the @code{fp} in effect before this program was applied. In
 effect, this and the return address are the registers that are always
-``saved''.
-
address@hidden External link
-This field is a reference to the list of heap-allocated variables
-associated with this frame. For a discussion of heap versus stack
-allocation, @xref{Variables and the VM}.
+``saved''. The dynamic link links the current frame to the previous
+frame; computing a stack trace involves traversing these frames.
 
 @item Local variable @var{n}
-Lambda-local variables that are allocated on the stack are all
-allocated as part of the frame. This makes access to non-captured,
-non-mutated variables very cheap.
+Lambda-local variables that are all allocated as part of the frame.
+This makes access to variables very cheap.
 
 @item Argument @var{n}
 The calling convention of the VM requires arguments of a function
-application to be pushed on the stack, and here they are. Normally
-references to arguments dispatch to these locations on the stack.
-However if an argument has to be stored on the heap, it will be copied
-from its initial value here onto a location in the heap, and
-thereafter only referenced on the heap.
+application to be pushed on the stack, and here they are. References
+to arguments dispatch to these locations on the stack.
 
 @item Program
 This is the program being applied. For more information on how
@@ -236,26 +227,44 @@ Consider the following Scheme code as an example:
     (lambda (b) (list foo a b)))
 @end example
 
-Within the lambda expression, "foo" is a top-level variable, "a" is a
-lexically captured variable, and "b" is a local variable.
-
address@hidden may safely be allocated on the stack, as there is no enclosed
-procedure that references it, nor is it ever mutated.
-
address@hidden, on the other hand, is referenced by an enclosed procedure,
-that of the lambda. Thus it must be allocated on the heap, as it may
-(and will) outlive the dynamic extent of the invocation of @code{foo}.
-
address@hidden is a top-level variable, because it names the procedure
address@hidden, which is here defined at the top-level.
-
-Note that variables that are mutated (via @code{set!}) must be
-allocated on the heap, even if they are local variables. This is
-because any called subprocedure might capture the continuation, which
-would need to capture locations instead of values. Thus perhaps
-counterintuitively, what would seem ``closer to the metal'', viz
address@hidden, actually forces heap allocation instead of stack
-allocation.
+Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a
+lexically captured variable, and @code{b} is a local variable.
+
+Another way to refer to @code{a} and @code{b} is to say that @code{a}
+is a ``free'' variable, since it is not defined within the lambda, and
address@hidden is a ``bound'' variable. These are the terms used in the
address@hidden calculus}, a mathematical notation for describing
+functions. The lambda calculus is useful because it allows one to
+prove statements about functions. It is especially good at describing
+scope relations, and it is for that reason that we mention it here.
+
+Guile allocates all variables on the stack. When a lexically enclosed
+procedure with free variables---a @dfn{closure}---is created, it
+copies those variables its free variable vector. References to free
+variables are then redirected through the free variable vector.
+
+If a variable is ever @code{set!}, however, it will need to be
+heap-allocated instead of stack-allocated, so that different closures
+that capture the same variable can see the same value. Also, this
+allows continuations to capture a reference to the variable, instead
+of to its value at one point in time. For these reasons, @code{set!}
+variables are allocated in ``boxes''---actually, in variable cells.
address@hidden, for more information. References to @code{set!}
+variables are indirected through the boxes.
+
+Thus perhaps counterintuitively, what would seem ``closer to the
+metal'', viz @code{set!}, actually forces an extra memory allocation
+and indirection.
+
+Going back to our example, @code{b} may be allocated on the stack, as
+it is never mutated.
+
address@hidden may also be allocated on the stack, as it too is never
+mutated. Within the enclosed lambda, its value will be copied into
+(and referenced from) the free variables vector.
+
address@hidden is a top-level variable, because @code{foo} is not
+lexically bound in this example.
 
 @node VM Programs
 @subsection Compiled Procedures are VM Programs
@@ -297,27 +306,26 @@ scheme@@(guile-user)> (define (foo a) (lambda (b) (list 
foo a b)))
 scheme@@(guile-user)> ,x foo
 Disassembly of #<program foo (a)>:
 
-   0    (local-ref 0)                   ;; `a' (arg)
-   2    (external-set 0)                ;; `a' (arg)
-   4    (object-ref 1)                  ;; #<program b70d2910 at <unknown 
port>:0:16 (b)>
-   6    (make-closure)                  
-   7    (return)                        
+   0    (object-ref 1)                  ;; #<program b7e478b0 at <unknown 
port>:0:16 (b)>
+   2    (local-ref 0)                   ;; `a' (arg)
+   4    (vector 0 1)                    ;; 1 element
+   7    (make-closure)                  
+   8    (return)                        
 
 ----------------------------------------
-Disassembly of #<program b70d2910 at <unknown port>:0:16 (b)>:
+Disassembly of #<program b7e478b0 at <unknown port>:0:16 (b)>:
 
    0    (toplevel-ref 1)                ;; `foo'
-   2    (external-ref 0)                ;; (closure variable)
+   2    (free-ref 0)                    ;; (closure variable)
    4    (local-ref 0)                   ;; `b' (arg)
    6    (list 0 3)                      ;; 3 elements         at (unknown 
file):0:28
    9    (return)                        
 @end smallexample
 
-At @code{ip} 0 and 2, we do the copy from argument to heap for
address@hidden @code{Ip} 4 loads up the compiled lambda, and then at
address@hidden 6 we make a closure---binding code (from the compiled
-lambda) with data (the heap-allocated variables). Finally we return
-the closure.
+At @code{ip} 0, we load up the compiled lambda. @code{Ip} 2 and 4
+create the free variables vector, and @code{ip} 7 makes the
+closure---binding code (from the compiled lambda) with data (the
+free-variable vector). Finally we return the closure.
 
 The second stanza disassembles the compiled lambda. Toplevel variables
 are resolved relative to the module that was current when the
@@ -336,7 +344,7 @@ routine.
 @node Instruction Set
 @subsection Instruction Set
 
-There are about 100 instructions in Guile's virtual machine. These
+There are about 150 instructions in Guile's virtual machine. These
 instructions represent atomic units of a program's execution. Ideally,
 they perform one task without conditional branches, then dispatch to
 the next instruction in the stream.
@@ -376,16 +384,22 @@ instructions. More instructions may be added over time.
 * Miscellaneous Instructions::  
 * Inlined Scheme Instructions::  
 * Inlined Mathematical Instructions::  
+* Inlined Bytevector Instructions::  
 @end menu
 
 @node Environment Control Instructions
 @subsubsection Environment Control Instructions
 
 These instructions access and mutate the environment of a compiled
-procedure---the local bindings, the ``external'' bindings, and the
+procedure---the local bindings, the free (captured) bindings, and the
 toplevel bindings.
 
+Some of these instructions have @code{long-} variants, the difference
+being that they take 16-bit arguments, encoded in big-endianness,
+instead of the normal 8-bit range.
+
 @deffn Instruction local-ref index
address@hidden Instruction long-local-ref index
 Push onto the stack the value of the local variable located at
 @var{index} within the current stack frame.
 
@@ -395,26 +409,62 @@ arguments.
 @end deffn
 
 @deffn Instruction local-set index
address@hidden Instruction long-local-ref index
 Pop the Scheme object located on top of the stack and make it the new
 value of the local variable located at @var{index} within the current
 stack frame.
 @end deffn
 
address@hidden Instruction external-ref index
-Push the value of the closure variable located at position
address@hidden within the program's list of external variables.
address@hidden Instruction free-ref index
+Push the value of the captured variable located at position
address@hidden within the program's vector of captured variables.
 @end deffn
 
address@hidden Instruction external-set index
-Pop the Scheme object located on top of the stack and make it the new
-value of the closure variable located at @var{index} within the
-program's list of external variables.
address@hidden Instruction free-boxed-ref index
address@hidden Instruction free-boxed-set index
+Get or set a boxed free variable. Note that there is no free-set
+instruction, as variables that are @code{set!} must be boxed.
+
+These instructions assume that the value at position @var{index} in
+the free variables vector is a variable.
 @end deffn
 
-The external variable lookup algorithm should probably be made more
-efficient in the future via addressing by frame and index. Currently,
-external variables are all consed onto a list, which results in O(N)
-lookup time.
address@hidden Instruction make-closure
+Pop a vector and a program object off the stack, in that order, and
+push a new program object with the given free variables vector. The
+new program object shares state with the original program.
+
+At the time of this writing, the space overhead of closures is 4 words
+per closure.
address@hidden deffn
+
address@hidden Instruction fix-closure index
+Pop a vector off the stack, and set it as the @var{index}th local
+variable's free variable vector. The @var{index}th local variable is
+assumed to be a procedure.
+
+This instruction is part of a hack for allocating mutually recursive
+procedures. The hack is to first perform a @code{local-set} for all of
+the recursive procedures, then fix up the procedures' free variable
+bindings in place. This allows most @code{letrec}-bound procedures to
+be allocated unboxed on the stack.
+
+One could of course do a @code{local-ref}, then @code{make-closure},
+then @code{local-set}, but this macroinstruction helps to speed up the
+common case.
address@hidden deffn
+
address@hidden Instruction box index
+Pop a value off the stack, and set the @var{index}nth local variable
+to a box containing that value. A shortcut for @code{make-variable}
+then @code{local-set}, used when binding boxed variables.
address@hidden deffn
+
address@hidden Instruction empty-box index
+Set the @var{indext}h local variable to a box containing a variable
+whose value is unbound. Used when compiling some @code{letrec}
+expressions.
address@hidden deffn
 
 @deffn Instruction toplevel-ref index
 @deffnx Instruction long-toplevel-ref index
@@ -442,9 +492,6 @@ in-place mutation of the object table. This mechanism 
provides for
 lazy variable resolution, and an important cached fast-path once the
 variable has been successfully resolved.
 
-The ``long'' variant has a 16-bit index instead of an 8-bit index,
-with the most significant byte first.
-
 This instruction pushes the value of the variable onto the stack.
 @end deffn
 
@@ -453,8 +500,13 @@ This instruction pushes the value of the variable onto the 
stack.
 Pop a value off the stack, and set it as the value of the toplevel
 variable stored at @var{index} in the object table. If the variable
 has not yet been looked up, we do the lookup as in
address@hidden The ``long'' variant has a 16-bit index instead
-of an 8-bit index.
address@hidden
address@hidden deffn
+
address@hidden Instruction define
+Pop a symbol and a value from the stack, in that order. Look up its
+binding in the current toplevel environment, creating the binding if
+necessary. Set the variable to the value.
 @end deffn
 
 @deffn Instruction link-now
@@ -476,6 +528,11 @@ Pop off two objects from the stack, a variable and a 
value, and set
 the variable to the value.
 @end deffn
 
address@hidden Instruction make-variable
+Replace the top object on the stack with a variable containing it.
+Used in some circumstances when compiling @code{letrec} expressions.
address@hidden deffn
+
 @deffn Instruction object-ref n
 @deffnx Instruction long-object-ref n
 Push @var{n}th value from the current program's object vector. The
@@ -499,7 +556,10 @@ the one to which the instruction pointer points).
 @end itemize
 
 Note that the offset passed to the instruction is encoded on two 8-bit
-integers which are then combined by the VM as one 16-bit integer.
+integers which are then combined by the VM as one 16-bit integer. Note
+also that jump targets in Guile are aligned on 8-byte boundaries, and
+that the offset refers to the @var{n}th 8-byte boundary, effectively
+giving Guile a 19-bit relative address space.
 
 @deffn Instruction br offset
 Jump to @var{offset}.
@@ -550,19 +610,21 @@ Load an arbitrary number from the instruction stream. The 
number is
 embedded in the stream as a string.
 @end deffn
 @deffn Instruction load-string length
-Load a string from the instruction stream.
+Load a string from the instruction stream. The string is assumed to be
+encoded in the ``latin1'' locale.
 @end deffn
address@hidden Instruction load-symbol length
-Load a symbol from the instruction stream.
address@hidden Instruction load-wide-string length
+Load a UTF-32 string from the instruction stream. @var{length} is the
+length in bytes, not in codepoints
 @end deffn
address@hidden Instruction load-keyword length
-Load a keyword from the instruction stream.
address@hidden Instruction load-symbol length
+Load a symbol from the instruction stream. The symbol is assumed to be
+encoded in the ``latin1'' locale. Symbols backed by wide strings may
+be loaded via @code{load-wide-string} then @code{make-symbol}.
 @end deffn
-
address@hidden Instruction define length
-Load a symbol from the instruction stream, and look up its binding in
-the current toplevel environment, creating the binding if necessary.
-Push the variable corresponding to the binding.
address@hidden Instruction load-array length
+Load a uniform array from the instruction stream. The shape and type
+of the array are popped off the stack, in that order.
 @end deffn
 
 @deffn Instruction load-program
@@ -579,23 +641,9 @@ because instead of parsing its data, it directly maps the 
instruction
 stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
 and Objcode}, for more information.
 
-The resulting compiled procedure will not have any ``external''
-variables captured, so it may be loaded only once but used many times
-to create closures.
address@hidden deffn
-
-Finally, while this instruction is not strictly a ``loading''
-instruction, it's useful to wind up the @code{load-program} discussion
-here:
-
address@hidden Instruction make-closure
-Pop the program object from the stack, capture the current set of
-``external'' variables, and assign those external variables to a copy
-of the program. Push the new program object, which shares state with
-the original program.
-
-At the time of this writing, the space overhead of closures is 4 words
-per closure.
+The resulting compiled procedure will not have any free variables
+captured, so it may be loaded only once but used many times to create
+closures.
 @end deffn
 
 @node Procedural Instructions
@@ -764,6 +812,19 @@ Push @code{'()} onto the stack.
 Push @var{value}, an 8-bit character, onto the stack.
 @end deffn
 
address@hidden Instruction make-char32 value
+Push @var{value}, an 32-bit character, onto the stack. The value is
+encoded in big-endian order.
address@hidden deffn
+
address@hidden Instruction make-symbol
+Pops a string off the stack, and pushes a symbol.
address@hidden deffn
+
address@hidden Instruction make-keyword value
+Pops a symbol off the stack, and pushes a keyword.
address@hidden deffn
+
 @deffn Instruction list n
 Pops off the top @var{n} values off of the stack, consing them up into
 a list, then pushes that list on the stack. What was the topmost value
@@ -807,7 +868,8 @@ pushes its elements on the stack.
 @subsubsection Miscellaneous Instructions
 
 @deffn Instruction nop
-Does nothing!
+Does nothing! Used for padding other instructions to certain
+alignments.
 @end deffn
 
 @deffn Instruction halt
@@ -873,6 +935,8 @@ stream.
 @deffnx Instruction cons x y
 @deffnx Instruction car x
 @deffnx Instruction cdr x
address@hidden Instruction vector-ref x y
address@hidden Instruction vector-set x n y
 Inlined implementations of their Scheme equivalents.
 @end deffn
 
@@ -893,7 +957,9 @@ As in the previous section, the definitions below show stack
 parameters instead of instruction stream parameters.
 
 @deffn Instruction add x y
address@hidden Instruction add1 x
 @deffnx Instruction sub x y
address@hidden Instruction sub1 x
 @deffnx Instruction mul x y
 @deffnx Instruction div x y
 @deffnx Instruction quo x y
@@ -906,3 +972,58 @@ parameters instead of instruction stream parameters.
 @deffnx Instruction ge? x y
 Inlined implementations of the corresponding mathematical operations.
 @end deffn
+
address@hidden Inlined Bytevector Instructions
address@hidden Inlined Bytevector Instructions
+
+Bytevector operations correspond closely to what the current hardware
+can do, so it makes sense to inline them to VM instructions, providing
+a clear path for eventual native compilation. Without this, Scheme
+programs would need other primitives for accessing raw bytes -- but
+these primitives are as good as any.
+
+As in the previous section, the definitions below show stack
+parameters instead of instruction stream parameters.
+
+The multibyte formats (@code{u16}, @code{f64}, etc) take an extra
+endianness argument. Only aligned native accesses are currently
+fast-pathed in Guile's VM.
+
address@hidden Instruction bv-u8-ref bv n
address@hidden Instruction bv-s8-ref bv n
address@hidden Instruction bv-u16-native-ref bv n
address@hidden Instruction bv-s16-native-ref bv n
address@hidden Instruction bv-u32-native-ref bv n
address@hidden Instruction bv-s32-native-ref bv n
address@hidden Instruction bv-u64-native-ref bv n
address@hidden Instruction bv-s64-native-ref bv n
address@hidden Instruction bv-f32-native-ref bv n
address@hidden Instruction bv-f64-native-ref bv n
address@hidden Instruction bv-u16-ref bv n endianness
address@hidden Instruction bv-s16-ref bv n endianness
address@hidden Instruction bv-u32-ref bv n endianness
address@hidden Instruction bv-s32-ref bv n endianness
address@hidden Instruction bv-u64-ref bv n endianness
address@hidden Instruction bv-s64-ref bv n endianness
address@hidden Instruction bv-f32-ref bv n endianness
address@hidden Instruction bv-f64-ref bv n endianness
address@hidden Instruction bv-u8-set bv n val
address@hidden Instruction bv-s8-set bv n val
address@hidden Instruction bv-u16-native-set bv n val
address@hidden Instruction bv-s16-native-set bv n val
address@hidden Instruction bv-u32-native-set bv n val
address@hidden Instruction bv-s32-native-set bv n val
address@hidden Instruction bv-u64-native-set bv n val
address@hidden Instruction bv-s64-native-set bv n val
address@hidden Instruction bv-f32-native-set bv n val
address@hidden Instruction bv-f64-native-set bv n val
address@hidden Instruction bv-u16-set bv n val endianness
address@hidden Instruction bv-s16-set bv n val endianness
address@hidden Instruction bv-u32-set bv n val endianness
address@hidden Instruction bv-s32-set bv n val endianness
address@hidden Instruction bv-u64-set bv n val endianness
address@hidden Instruction bv-s64-set bv n val endianness
address@hidden Instruction bv-f32-set bv n val endianness
address@hidden Instruction bv-f64-set bv n val endianness
+Inlined implementations of the corresponding bytevector operations.
address@hidden deffn
diff --git a/libguile/_scm.h b/libguile/_scm.h
index 429e87b..e5af905 100644
--- a/libguile/_scm.h
+++ b/libguile/_scm.h
@@ -156,6 +156,36 @@
 #define scm_from_off64_t  scm_from_int64
 
 
+/* The endianness marker in objcode.  */
+#ifdef WORDS_BIGENDIAN
+# define SCM_OBJCODE_ENDIANNESS "BE"
+#else
+# define SCM_OBJCODE_ENDIANNESS "LE"
+#endif
+
+#define _SCM_CPP_STRINGIFY(x)  # x
+#define SCM_CPP_STRINGIFY(x)   _SCM_CPP_STRINGIFY (x)
+
+/* The word size marker in objcode.  */
+#define SCM_OBJCODE_WORD_SIZE  SCM_CPP_STRINGIFY (SIZEOF_VOID_P)
+
+/* Major and minor versions must be single characters. */
+#define SCM_OBJCODE_MAJOR_VERSION 0
+#define SCM_OBJCODE_MINOR_VERSION B
+#define SCM_OBJCODE_MAJOR_VERSION_STRING        \
+  SCM_CPP_STRINGIFY(SCM_OBJCODE_MAJOR_VERSION)
+#define SCM_OBJCODE_MINOR_VERSION_STRING        \
+  SCM_CPP_STRINGIFY(SCM_OBJCODE_MINOR_VERSION)
+#define SCM_OBJCODE_VERSION_STRING                                      \
+  SCM_OBJCODE_MAJOR_VERSION_STRING "." SCM_OBJCODE_MINOR_VERSION_STRING
+#define SCM_OBJCODE_MACHINE_VERSION_STRING                              \
+  SCM_OBJCODE_VERSION_STRING "-" SCM_OBJCODE_ENDIANNESS "-" 
SCM_OBJCODE_WORD_SIZE
+
+/* The objcode magic header.  */
+#define SCM_OBJCODE_COOKIE                              \
+  "GOOF-" SCM_OBJCODE_MACHINE_VERSION_STRING "---"
+
+
 #endif  /* SCM__SCM_H */
 
 /*
diff --git a/libguile/ports.c b/libguile/ports.c
index 7560ae6..ae6074d 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -1175,7 +1175,7 @@ scm_lfwrite_substr (SCM str, size_t start, size_t end, 
SCM port)
   if (pt->rw_active == SCM_PORT_READ)
     scm_end_input (port);
 
-  if (end == -1)
+  if (end == (size_t) (-1))
     end = size;
   size = end - start;
 
@@ -1198,7 +1198,7 @@ scm_lfwrite_substr (SCM str, size_t start, size_t end, 
SCM port)
 void
 scm_lfwrite_str (SCM str, SCM port)
 {
-  scm_lfwrite_substr (str, 0, -1, port);
+  scm_lfwrite_substr (str, 0, (size_t) (-1), port);
 }
 
 /* scm_c_read
diff --git a/libguile/strings.c b/libguile/strings.c
index 79eeb57..672881c 100644
--- a/libguile/strings.c
+++ b/libguile/strings.c
@@ -1090,10 +1090,11 @@ SCM_DEFINE (scm_string, "string", 0, 0, 1,
 
   /* Verify that this is a list of chars.  */
   i = scm_ilength (chrs);
+  SCM_ASSERT (i >= 0, chrs, SCM_ARG1, FUNC_NAME);
+
   len = (size_t) i;
   rest = chrs;
 
-  SCM_ASSERT (len >= 0, chrs, SCM_ARG1, FUNC_NAME);
   while (len > 0 && scm_is_pair (rest))
     {
       SCM elt = SCM_CAR (rest);
@@ -1372,7 +1373,7 @@ SCM_DEFINE (scm_string_append, "string-append", 0, 0, 1,
   size_t len = 0;
   int wide = 0;
   SCM l, s;
-  int i;
+  size_t i;
   union
   {
     char *narrow;
diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
index 98a6e49..b0888c1 100644
--- a/libguile/vm-engine.c
+++ b/libguile/vm-engine.c
@@ -220,6 +220,10 @@ VM_NAME (struct scm_vm *vp, SCM program, SCM *argv, int 
nargs)
     finish_args = SCM_EOL;
     goto vm_error;
 
+  vm_error_bad_wide_string_length:
+    err_msg  = scm_from_locale_string ("VM: Bad wide string length: ~S");
+    goto vm_error;
+
 #if VM_CHECK_IP
   vm_error_invalid_address:
     err_msg  = scm_from_locale_string ("VM: Invalid program address");
diff --git a/libguile/vm-i-loader.c b/libguile/vm-i-loader.c
index 8de7f00..e242ef9 100644
--- a/libguile/vm-i-loader.c
+++ b/libguile/vm-i-loader.c
@@ -20,42 +20,6 @@
 
 /* This file is included in vm_engine.c */
 
-VM_DEFINE_LOADER (80, load_unsigned_integer, "load-unsigned-integer")
-{
-  size_t len;
-
-  FETCH_LENGTH (len);
-  if (SCM_LIKELY (len <= 8))
-    {
-      scm_t_uint64 val = 0;
-      while (len-- > 0)
-       val = (val << 8U) + FETCH ();
-      SYNC_REGISTER ();
-      PUSH (scm_from_uint64 (val));
-      NEXT;
-    }
-  else
-    SCM_MISC_ERROR ("load-unsigned-integer: not implemented yet", SCM_EOL);
-}
-
-VM_DEFINE_LOADER (81, load_integer, "load-integer")
-{
-  size_t len;
-
-  FETCH_LENGTH (len);
-  if (SCM_LIKELY (len <= 4))
-    {
-      int val = 0;
-      while (len-- > 0)
-       val = (val << 8) + FETCH ();
-      SYNC_REGISTER ();
-      PUSH (scm_from_int (val));
-      NEXT;
-    }
-  else
-    SCM_MISC_ERROR ("load-integer: not implemented yet", SCM_EOL);
-}
-
 VM_DEFINE_LOADER (82, load_number, "load-number")
 {
   size_t len;
@@ -72,82 +36,24 @@ VM_DEFINE_LOADER (82, load_number, "load-number")
 VM_DEFINE_LOADER (83, load_string, "load-string")
 {
   size_t len;
-  int width;
-  SCM str;
+  char *buf;
 
   FETCH_LENGTH (len);
-  FETCH_WIDTH (width);
   SYNC_REGISTER ();
-  if (width == 1)
-    {
-      char *buf;
-      str = scm_i_make_string (len, &buf);
-      memcpy (buf, (char *) ip, len);
-    }
-  else if (width == 4)
-    {
-      scm_t_wchar *wbuf;
-      str = scm_i_make_wide_string (len, &wbuf);
-      memcpy ((char *) wbuf, (char *) ip, len * width);
-    }
-  else
-    SCM_MISC_ERROR ("load-string: invalid character width", SCM_EOL);
-  PUSH (str);
-  ip += len * width;
+  PUSH (scm_i_make_string (len, &buf));
+  memcpy (buf, (char *) ip, len);
+  ip += len;
   NEXT;
 }
 
 VM_DEFINE_LOADER (84, load_symbol, "load-symbol")
 {
   size_t len;
-  int width;
-  SCM str;
-  FETCH_LENGTH (len);
-  FETCH_WIDTH (width);
-  SYNC_REGISTER ();
-  if (width == 1)
-    {
-      char *buf;
-      str = scm_i_make_string (len, &buf);
-      memcpy (buf, (char *) ip, len);
-    }
-  else if (width == 4)
-    {
-      scm_t_wchar *wbuf;
-      str = scm_i_make_wide_string (len, &wbuf);
-      memcpy ((char *) wbuf, (char *) ip, len * width);
-    }
-  else
-    SCM_MISC_ERROR ("load-symbol: invalid character width", SCM_EOL);
-  PUSH (scm_string_to_symbol (str));
-  ip += len * width;
-  NEXT;
-}
-
-VM_DEFINE_LOADER (85, load_keyword, "load-keyword")
-{
-  size_t len;
-  int width;
-  SCM str;
   FETCH_LENGTH (len);
-  FETCH_WIDTH (width);
   SYNC_REGISTER ();
-  if (width == 1)
-    {
-      char *buf;
-      str = scm_i_make_string (len, &buf);
-      memcpy (buf, (char *) ip, len);
-    }
-  else if (width == 4)
-    {
-      scm_t_wchar *wbuf;
-      str = scm_i_make_wide_string (len, &wbuf);
-      memcpy ((char *) wbuf, (char *) ip, len * width);
-    }
-  else
-    SCM_MISC_ERROR ("load-keyword: invalid character width", SCM_EOL);
-  PUSH (scm_symbol_to_keyword (scm_string_to_symbol (str)));
-  ip += len * width;
+  /* FIXME: should be scm_from_latin1_symboln */
+  PUSH (scm_from_locale_symboln ((const char*)ip, len));
+  ip += len;
   NEXT;
 }
 
@@ -181,46 +87,33 @@ VM_DEFINE_INSTRUCTION (87, link_now, "link-now", 0, 1, 1)
   NEXT;
 }
 
-VM_DEFINE_LOADER (88, define, "define")
+VM_DEFINE_LOADER (89, load_array, "load-array")
 {
-  SCM str, sym;
+  SCM type, shape;
   size_t len;
-
-  int width;
   FETCH_LENGTH (len);
-  FETCH_WIDTH (width);
-  SYNC_REGISTER ();
-  if (width == 1)
-    {
-      char *buf;
-      str = scm_i_make_string (len, &buf);
-      memcpy (buf, (char *) ip, len);
-    }
-  else if (width == 4)
-    {
-      scm_t_wchar *wbuf;
-      str = scm_i_make_wide_string (len, &wbuf);
-      memcpy ((char *) wbuf, (char *) ip, len * width);
-    }
-  else
-    SCM_MISC_ERROR ("load define: invalid character width", SCM_EOL);
-  sym = scm_string_to_symbol (str);
-  ip += len * width;
-
+  POP (shape);
+  POP (type);
   SYNC_REGISTER ();
-  PUSH (scm_sym2var (sym, scm_current_module_lookup_closure (), SCM_BOOL_T));
+  PUSH (scm_from_contiguous_typed_array (type, shape, ip, len));
+  ip += len;
   NEXT;
 }
 
-VM_DEFINE_LOADER (89, load_array, "load-array")
+VM_DEFINE_LOADER (90, load_wide_string, "load-wide-string")
 {
-  SCM type, shape;
   size_t len;
+  scm_t_wchar *wbuf;
+
   FETCH_LENGTH (len);
-  POP (shape);
-  POP (type);
+  if (SCM_UNLIKELY (len % 4))
+    { finish_args = scm_list_1 (scm_from_size_t (len));
+      goto vm_error_bad_wide_string_length;
+    }
+
   SYNC_REGISTER ();
-  PUSH (scm_from_contiguous_typed_array (type, shape, ip, len));
+  PUSH (scm_i_make_wide_string (len / 4, &wbuf));
+  memcpy ((char *) wbuf, (char *) ip, len);
   ip += len;
   NEXT;
 }
diff --git a/libguile/vm-i-system.c b/libguile/vm-i-system.c
index cb7498e..dbba24d 100644
--- a/libguile/vm-i-system.c
+++ b/libguile/vm-i-system.c
@@ -1233,6 +1233,34 @@ VM_DEFINE_INSTRUCTION (65, fix_closure, "fix-closure", 
2, 0, 1)
   NEXT;
 }
 
+VM_DEFINE_INSTRUCTION (66, define, "define", 0, 0, 2)
+{
+  SCM sym, val;
+  POP (sym);
+  POP (val);
+  SYNC_REGISTER ();
+  VARIABLE_SET (scm_sym2var (sym, scm_current_module_lookup_closure (),
+                             SCM_BOOL_T),
+                val);
+  NEXT;
+}
+
+VM_DEFINE_INSTRUCTION (67, make_keyword, "make-keyword", 0, 1, 1)
+{
+  CHECK_UNDERFLOW ();
+  SYNC_REGISTER ();
+  *sp = scm_symbol_to_keyword (*sp);
+  NEXT;
+}
+
+VM_DEFINE_INSTRUCTION (68, make_symbol, "make-symbol", 0, 1, 1)
+{
+  CHECK_UNDERFLOW ();
+  SYNC_REGISTER ();
+  *sp = scm_string_to_symbol (*sp);
+  NEXT;
+}
+
 
 /*
 (defun renumber-ops ()
diff --git a/module/language/assembly.scm b/module/language/assembly.scm
index 5571bee..683da6c 100644
--- a/module/language/assembly.scm
+++ b/module/language/assembly.scm
@@ -34,30 +34,21 @@
 ;; lengths are encoded in 3 bytes
 (define *len-len* 3)
 
-;; the number of bytes per string character is encoded in 1 byte
-(define *width-len* 1)
-
 
 (define (byte-length assembly)
   (pmatch assembly
     (,label (guard (not (pair? label)))
      0)
-    ((load-unsigned-integer ,str)
-     (+ 1 *len-len* (string-length str)))
-    ((load-integer ,str)
-     (+ 1 *len-len* (string-length str)))
     ((load-number ,str)
      (+ 1 *len-len* (string-length str)))
     ((load-string ,str)
-     (+ 1 *len-len* *width-len* (* (string-width str) (string-length str))))
+     (+ 1 *len-len* (string-length str)))
+    ((load-wide-string ,str)
+     (+ 1 *len-len* (* 4 (string-length str))))
     ((load-symbol ,str)
-     (+ 1 *len-len* *width-len* (* (string-width str) (string-length str))))
-    ((load-keyword ,str)
-     (+ 1 *len-len* *width-len* (* (string-width str) (string-length str))))
+     (+ 1 *len-len* (string-length str)))
     ((load-array ,bv)
      (+ 1 *len-len* (bytevector-length bv)))
-    ((define ,str)
-     (+ 1 *len-len* *width-len* (* (string-width str) (string-length str))))
     ((load-program ,nargs ,nrest ,nlocs ,labels ,len ,meta . ,code)
      (+ 1 *program-header-len* len (if meta (1- (byte-length meta)) 0)))
     ((,inst . _) (guard (>= (instruction-length inst) 0))
@@ -171,5 +162,4 @@
                        n4)))
     ((load-string ,s) s)
     ((load-symbol ,s) (string->symbol s))
-    ((load-keyword ,s) (symbol->keyword (string->symbol s)))
     (else #f)))
diff --git a/module/language/assembly/compile-bytecode.scm 
b/module/language/assembly/compile-bytecode.scm
index 840c73b..c49c200 100644
--- a/module/language/assembly/compile-bytecode.scm
+++ b/module/language/assembly/compile-bytecode.scm
@@ -65,11 +65,13 @@
     (write-byte (logand (ash x -8) 255))
     (write-byte (logand (ash x -16) 255))
     (write-byte (logand (ash x -24) 255)))
-  (define (write-uint32 x) (case byte-order
-                             ((1234) (write-uint32-le x))
-                             ((4321) (write-uint32-be x))
-                             (else (error "unknown endianness" byte-order))))
+  (define (write-uint32 x)
+    (case byte-order
+      ((1234) (write-uint32-le x))
+      ((4321) (write-uint32-be x))
+      (else (error "unknown endianness" byte-order))))
   (define (write-wide-string s)
+    (write-loader-len (* 4 (string-length s)))
     (string-for-each (lambda (c) (write-uint32 (char->integer c))) s))
   (define (write-loader-len len)
     (write-byte (ash len -16))
@@ -133,14 +135,11 @@
                ;; `scm_c_make_objcode_slice ()'.
                (write-bytecode meta write get-addr '()))))
         ((make-char32 ,x) (write-uint32-be x))
-        ((load-unsigned-integer ,str) (write-loader str))
-        ((load-integer ,str) (write-loader str))
         ((load-number ,str) (write-loader str))
-        ((load-string ,str) (write-sized-loader str))
-        ((load-symbol ,str) (write-sized-loader str))
-        ((load-keyword ,str) (write-sized-loader str))
+        ((load-string ,str) (write-loader str))
+        ((load-wide-string ,str) (write-wide-string str))
+        ((load-symbol ,str) (write-loader str))
         ((load-array ,bv) (write-bytevector bv))
-        ((define ,str) (write-sized-loader str))
         ((br ,l) (write-break l))
         ((br-if ,l) (write-break l))
         ((br-if-not ,l) (write-break l))
diff --git a/module/language/assembly/decompile-bytecode.scm 
b/module/language/assembly/decompile-bytecode.scm
index a05db53..8cdebcf 100644
--- a/module/language/assembly/decompile-bytecode.scm
+++ b/module/language/assembly/decompile-bytecode.scm
@@ -96,16 +96,6 @@
                   (lp (cons exp out))))))))))
 
 (define (decode-bytecode pop)
-  (define (get1 bytes-per-char)
-    (if (= bytes-per-char 1)
-        (pop)
-        (let* ((a (pop))
-               (b (pop))
-               (c (pop))
-               (d (pop)))
-          (if (= byte-order 1234)
-              (+ (ash d 24) (ash c 16) (ash b 8) a)            
-              (+ (ash a 24) (ash b 16) (ash c 8) d)))))
   (and=> (pop)
          (lambda (opcode)
            (let ((inst (opcode->instruction opcode)))
@@ -117,29 +107,24 @@
                ;; the negative length indicates a variable length
                ;; instruction
                (let* ((make-sequence
-                       (if (eq? inst 'load-array)
+                       (if (or (memq inst '(load-array load-wide-string)))
                            make-bytevector
                            make-string))
                       (sequence-set!
-                       (if (eq? inst 'load-array)
+                       (if (or (memq inst '(load-array load-wide-string)))
                            bytevector-u8-set!
                            (lambda (str pos value)
                              (string-set! str pos (integer->char value)))))
                       (len (let* ((a (pop)) (b (pop)) (c (pop)))
                              (+ (ash a 16) (ash b 8) c)))
-                      (bytes-per-count
-                       (if (or (eq? inst 'load-string)
-                               (eq? inst 'load-symbol)
-                               (eq? inst 'load-keyword)
-                               (eq? inst 'define))
-                           (pop)
-                           1))
                       (seq (make-sequence len)))
                  (let lp ((i 0))
                    (if (= i len)
-                       `(,inst ,seq)
+                       `(,inst ,(if (eq? inst 'load-wide-string)
+                                    (utf32->string seq)
+                                    seq))
                        (begin
-                         (sequence-set! seq i (get1 bytes-per-count))
+                         (sequence-set! seq i (pop))
                          (lp (1+ i)))))))
               (else
                ;; fixed length
diff --git a/module/language/assembly/disassemble.scm 
b/module/language/assembly/disassemble.scm
index d41c816..492acb7 100644
--- a/module/language/assembly/disassemble.scm
+++ b/module/language/assembly/disassemble.scm
@@ -60,6 +60,8 @@
                   (print-info pos `(load-program ,sym) #f #f)
                   (lp (+ pos (byte-length asm)) (cdr code)
                       (acons sym asm programs))))
+               ((nop)
+                (lp (+ pos (byte-length asm)) (cdr code) programs))
                (else
                 (print-info pos asm
                             (code-annotation end asm objs nargs blocs
diff --git a/module/language/glil/compile-assembly.scm 
b/module/language/glil/compile-assembly.scm
index 4bd6c4f..c67ef69 100644
--- a/module/language/glil/compile-assembly.scm
+++ b/module/language/glil/compile-assembly.scm
@@ -318,8 +318,8 @@
                                      ,(modulo i 256))))
                               object-alist)))))
        ((define)
-        (emit-code `((define ,(symbol->string name))
-                     (variable-set))))
+        (emit-code `(,@(dump-object name addr)
+                     (define))))
        (else
         (error "unknown toplevel var kind" op name))))
 
@@ -391,11 +391,20 @@
    ((number? x)
     `((load-number ,(number->string x))))
    ((string? x)
-    `((load-string ,x)))
+    (case (string-width x)
+      ((1) `((load-string ,x)))
+      ((4) (align-code `(load-wide-string ,x) addr 4 4))
+      (else (error "bad string width" x))))
    ((symbol? x)
-    `((load-symbol ,(symbol->string x))))
+    (let ((str (symbol->string x)))
+      (case (string-width str)
+        ((1) `((load-symbol ,str)))
+        ((4) `(,@(dump-object str addr)
+               (make-symbol)))
+        (else (error "bad string width" str)))))
    ((keyword? x)
-    `((load-keyword ,(symbol->string (keyword->symbol x)))))
+    `(,@(dump-object (keyword->symbol x) addr)
+      (make-keyword)))
    ((list? x)
     (let ((tail (let ((len (length x)))
                   (if (>= len 65536) (too-long "list"))
diff --git a/module/language/tree-il/compile-glil.scm 
b/module/language/tree-il/compile-glil.scm
index 48db6f6..8886fa3 100644
--- a/module/language/tree-il/compile-glil.scm
+++ b/module/language/tree-il/compile-glil.scm
@@ -53,16 +53,16 @@
     (or (and=> (memq #:warnings opts) cadr)
         '()))
 
-  (let* ((x (make-lambda (tree-il-src x) '() '() '() x))
-         (x (optimize! x e opts))
-         (allocation (analyze-lexicals x)))
-
-    ;; Go throught the warning passes.
-    (for-each (lambda (kind)
+  ;; Go throught the warning passes.
+  (for-each (lambda (kind)
                 (let ((warn (assoc-ref %warning-passes kind)))
                   (and (procedure? warn)
                        (warn x))))
-              warnings)
+            warnings)
+
+  (let* ((x (make-lambda (tree-il-src x) '() '() '() x))
+         (x (optimize! x e opts))
+         (allocation (analyze-lexicals x)))
 
     (with-fluid* *comp-module* (or (and e (car e)) (current-module))
       (lambda ()
@@ -492,11 +492,16 @@
            ((tail push vals)
             (emit-code src (make-glil-toplevel 'ref name))))
          (maybe-emit-return))
-        (else
-         (pk 'ew-the-badness x (current-module) (fluid-ref *comp-module*))
+        ((module-variable the-root-module name)
          (case context
            ((tail push vals)
             (emit-code src (make-glil-module 'ref '(guile) name #f))))
+         (maybe-emit-return))
+        (else
+         (case context
+           ((tail push vals)
+            (emit-code src (make-glil-module
+                            'ref (module-name (fluid-ref *comp-module*)) name 
#f))))
          (maybe-emit-return))))
 
       ((<lexical-ref> src name gensym)
diff --git a/module/language/tree-il/fix-letrec.scm 
b/module/language/tree-il/fix-letrec.scm
index 0ed7b6b..9b66d9e 100644
--- a/module/language/tree-il/fix-letrec.scm
+++ b/module/language/tree-il/fix-letrec.scm
@@ -78,6 +78,13 @@
                                simple
                                lambda*
                                complex))
+                      ((<let> vars)
+                       (values (append vars unref)
+                               ref
+                               set
+                               simple
+                               lambda*
+                               complex))
                       (else
                        (values unref ref set simple lambda* complex))))
                   (lambda (x unref ref set simple lambda* complex)
@@ -108,6 +115,39 @@
                           (else
                            (lp (cdr vars) (cdr vals)
                                s l (cons (car vars) c))))))
+                      ((<let> (orig-vars vars) vals)
+                       ;; The point is to compile let-bound lambdas as
+                       ;; efficiently as we do letrec-bound lambdas, so
+                       ;; we use the same algorithm for analyzing the
+                       ;; vars. There is no problem recursing into the
+                       ;; bindings after the let, because all variables
+                       ;; have been renamed.
+                       (let lp ((vars orig-vars) (vals vals)
+                                (s '()) (l '()) (c '()))
+                         (cond
+                          ((null? vars)
+                           (values unref
+                                   ref
+                                   set
+                                   (append s simple)
+                                   (append l lambda*)
+                                   (append c complex)))
+                          ((memq (car vars) unref)
+                           (lp (cdr vars) (cdr vals)
+                               s l c))
+                          ((memq (car vars) set)
+                           (lp (cdr vars) (cdr vals)
+                               s l (cons (car vars) c)))
+                          ((and (lambda? (car vals))
+                                (not (memq (car vars) set)))
+                           (lp (cdr vars) (cdr vals)
+                               s (cons (car vars) l) c))
+                          ;; There is no difference between simple and
+                          ;; complex, for the purposes of let. Just lump
+                          ;; them all into complex.
+                          (else
+                           (lp (cdr vars) (cdr vals)
+                               s l (cons (car vars) c))))))
                       (else
                        (values unref ref set simple lambda* complex))))
                   '()
@@ -128,7 +168,7 @@
          ;; expression, called for effect.
          ((<lexical-set> gensym exp)
           (if (memq gensym unref)
-              (make-sequence #f (list (make-void #f) exp))
+              (make-sequence #f (list exp (make-void #f)))
               x))
 
          ((<letrec> src names vars vals body)
@@ -176,5 +216,25 @@
                        ;; Finally, the body.
                        body)))))))))
 
+         ((<let> src names vars vals body)
+          (let ((binds (map list vars names vals)))
+            (define (lookup set)
+              (map (lambda (v) (assq v binds))
+                   (lset-intersection eq? vars set)))
+            (let ((u (lookup unref))
+                  (l (lookup lambda*))
+                  (c (lookup complex)))
+              (make-sequence
+               src
+               (append
+                ;; unreferenced bindings, called for effect.
+                (map caddr u)
+                (list
+                 ;; unassigned lambdas use fix.
+                 (make-fix src (map cadr l) (map car l) (map caddr l)
+                           ;; and the "complex" bindings.
+                           (make-let src (map cadr c) (map car c) (map caddr c)
+                                     body))))))))
+         
          (else x)))
      x)))
diff --git a/module/system/xref.scm b/module/system/xref.scm
index 0613754..906ec8e 100644
--- a/module/system/xref.scm
+++ b/module/system/xref.scm
@@ -35,7 +35,7 @@
                 (progv (make-vector (vector-length objects) #f))
                 (asm (decompile (program-objcode prog) #:to 'assembly)))
             (pmatch asm
-              ((load-program ,nargs ,nrest ,nlocs ,next ,labels ,len . ,body)
+              ((load-program ,nargs ,nrest ,nlocs ,labels ,len . ,body)
                (for-each
                 (lambda (x)
                   (pmatch x
diff --git a/test-suite/tests/asm-to-bytecode.test 
b/test-suite/tests/asm-to-bytecode.test
index d01e93c..a8e251b 100644
--- a/test-suite/tests/asm-to-bytecode.test
+++ b/test-suite/tests/asm-to-bytecode.test
@@ -65,31 +65,18 @@
     (comp-test '(make-int8 3)
                #(make-int8 3))
     
-    (comp-test `(load-integer ,(string (integer->char 0)))
-               #(load-integer 0 0 1 0))
-    
-    (comp-test `(load-integer ,(string (integer->char 255)))
-               #(load-integer 0 0 1 255))
-    
-    (comp-test `(load-integer ,(string (integer->char 1) (integer->char 0)))
-               #(load-integer 0 0 2 1 0))
-    
     (comp-test '(load-number "3.14")
                (vector 'load-number 0 0 4 (char->integer #\3) (char->integer 
#\.)
                        (char->integer #\1) (char->integer #\4)))
     
     (comp-test '(load-string "foo")
-               (vector 'load-string 0 0 3 1 (char->integer #\f) (char->integer 
#\o)
+               (vector 'load-string 0 0 3 (char->integer #\f) (char->integer 
#\o)
                        (char->integer #\o)))
     
     (comp-test '(load-symbol "foo")
-               (vector 'load-symbol 0 0 3 1 (char->integer #\f) (char->integer 
#\o)
+               (vector 'load-symbol 0 0 3 (char->integer #\f) (char->integer 
#\o)
                        (char->integer #\o)))
     
-    (comp-test '(load-keyword "qux")
-               (vector 'load-keyword 0 0 3 1 (char->integer #\q) 
(char->integer #\u)
-                       (char->integer #\x)))
-
     (comp-test '(load-program 3 2 1 () 3 #f (make-int8 3) (return))
                #(load-program
                  3 2 (uint16 1) ;; nargs, nrest, nlocs
diff --git a/test-suite/tests/strings.test b/test-suite/tests/strings.test
index d82a472..a35dd20 100644
--- a/test-suite/tests/strings.test
+++ b/test-suite/tests/strings.test
@@ -447,7 +447,17 @@
       (string-set! s 4 (integer->char #x010300))
       (char=? (string-ref s 4) (integer->char #x010300)))))
 
-
+;;
+;; list->string
+;;
+(with-test-prefix "string"
+
+  (pass-if-exception "convert circular list to string"
+     exception:wrong-type-arg
+     (let ((foo (list #\a #\b #\c)))
+       (set-cdr! (cddr foo) (cdr foo))
+       (apply string foo))))
+ 
 (with-test-prefix "string-split"
 
   ;; in guile 1.6.7 and earlier, character >=128 wasn't matched in the string
diff --git a/test-suite/tests/tree-il.test b/test-suite/tests/tree-il.test
index d993e4f..73ea9c1 100644
--- a/test-suite/tests/tree-il.test
+++ b/test-suite/tests/tree-il.test
@@ -151,25 +151,33 @@
 
 (with-test-prefix "lexical sets"
   (assert-tree-il->glil
-   (let (x) (y) ((const 1)) (set! (lexical x y) (const 2)))
+   ;; unreferenced sets may be optimized away -- make sure they are ref'd
+   (let (x) (y) ((const 1))
+        (set! (lexical x y) (apply (primitive 1+) (lexical x y))))
    (program 0 0 1 ()
             (const 1) (bind (x #t 0)) (lexical #t #t box 0)
-            (const 2) (lexical #t #t set 0) (void) (call return 1)
+            (lexical #t #t ref 0) (call add1 1) (lexical #t #t set 0)
+            (void) (call return 1)
             (unbind)))
 
   (assert-tree-il->glil
-   (let (x) (y) ((const 1)) (begin (set! (lexical x y) (const 2)) (const #f)))
+   (let (x) (y) ((const 1))
+        (begin (set! (lexical x y) (apply (primitive 1+) (lexical x y)))
+               (lexical x y)))
    (program 0 0 1 ()
             (const 1) (bind (x #t 0)) (lexical #t #t box 0)
-            (const 2) (lexical #t #t set 0) (const #f) (call return 1)
+            (lexical #t #t ref 0) (call add1 1) (lexical #t #t set 0)
+            (lexical #t #t ref 0) (call return 1)
             (unbind)))
 
   (assert-tree-il->glil
    (let (x) (y) ((const 1))
-     (apply (primitive null?) (set! (lexical x y) (const 2))))
+     (apply (primitive null?)
+            (set! (lexical x y) (apply (primitive 1+) (lexical x y)))))
    (program 0 0 1 ()
             (const 1) (bind (x #t 0)) (lexical #t #t box 0)
-            (const 2) (lexical #t #t set 0) (void) (call null? 1) (call return 
1)
+            (lexical #t #t ref 0) (call add1 1) (lexical #t #t set 0) (void)
+            (call null? 1) (call return 1)
             (unbind))))
 
 (with-test-prefix "module refs"
@@ -413,20 +421,19 @@
             (unbind))
    (eq? l1 l2))
 
+  ;; second bound var is unreferenced
   (assert-tree-il->glil/pmatch
    (let (x) (y) ((const 1))
         (if (lexical x y)
             (lexical x y)
             (let (a) (b) ((const 2))
                  (lexical x y))))
-   (program 0 0 2 ()
+   (program 0 0 1 ()
             (const 1) (bind (x #f 0)) (lexical #t #f set 0)
             (lexical #t #f ref 0) (branch br-if-not ,l1)
             (lexical #t #f ref 0) (call return 1)
             (label ,l2)
-            (const 2) (bind (a #f 1)) (lexical #t #f set 1)
             (lexical #t #f ref 0) (call return 1)
-            (unbind)
             (unbind))
    (eq? l1 l2)))
 


hooks/post-receive
-- 
GNU Guile




reply via email to

[Prev in Thread] Current Thread [Next in Thread]