[Guile-commits] GNU Guile branch, master, updated. release_1-9-2-158-g87

From: Michael Gran
Subject: [Guile-commits] GNU Guile branch, master, updated. release_1-9-2-158-g8748ffe
Date: Sat, 05 Sep 2009 17:44:27 +0000

- Log -----------------------------------------------------------------
commit 8748ffeaa770ed47192f970ef5302a7c7aa7a935
Author: Michael Gran <address@hidden>
Date:   Sat Sep 5 10:42:15 2009 -0700

    Doc updates for character encoding of source code files
    * NEWS
    * doc/ref/scheme-scripts.texi: doc updates for character encoding of
      source code
    * doc/ref/api-evaluation.texi: doc updates for character encoding of
      source code


Summary of changes:
 NEWS                        |   12 +++++++
 doc/ref/api-evaluation.texi |   70 +++++++++++++++++++++++++++++++++++++++++++
 doc/ref/scheme-scripts.texi |    6 ++++
 3 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index a3c4ddd..147d082 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,18 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
 Changes in 1.9.3 (since the 1.9.2 prerelease):
+** Non-ASCII source code files can be read, but require coding
+   declarations
+The default reader now handles source code files for some of the
+non-ASCII character encodings, such as UTF-8.  A non-ASCII source file
+should have an encoding declaration near the top of the file.  Also,
+there is a new function file-encoding that scans a port for a coding
+The pre-1.9.3 reader handled 8-bit clean but otherwise unspecified source
+code.  This use is now discouraged.
 ** Ports do transcoding
 Ports now have an associated character encoding, and port read/write
diff --git a/doc/ref/api-evaluation.texi b/doc/ref/api-evaluation.texi
index d841215..9fc5ef5 100644
--- a/doc/ref/api-evaluation.texi
+++ b/doc/ref/api-evaluation.texi
@@ -17,6 +17,7 @@ loading, evaluating, and compiling Scheme code at run time.
 * Fly Evaluation::              Procedures for on the fly evaluation.
 * Compilation::                 How to compile Scheme files and procedures.
 * Loading::                     Loading Scheme code from file.
+* Character Encoding of Source Files:: Loading non-ASCII Scheme code from file.
 * Delayed Evaluation::          Postponing evaluation until it is needed.
 * Local Evaluation::            Evaluation in a local environment.
 * Evaluator Behaviour::         Modifying Guile's evaluator.
@@ -229,6 +230,12 @@ Thus a Guile script often starts like this.
 More details on Guile scripting can be found in the scripting section
 (@pxref{Guile Scripting}).
+There is one special case where the contents of a comment can actually
+affect the interpretation of code.  When a character encoding
+declaration, such as @code{coding: utf-8} appears in one of the first
+few lines of a source file, it indicates to Guile's default reader
+that this source code file is not ASCII.  For details see @ref{Character
+Encoding of Source Files}.
 @node Case Sensitivity
 @subsubsection Case Sensitivity
@@ -590,6 +597,69 @@ a file to load.  By default, @code{%load-extensions} is 
bound to the
 list @code{("" ".scm")}.
 @end defvar
address@hidden Character Encoding of Source Files
address@hidden Character Encoding of Source Files
address@hidden primitive-load
address@hidden load
+Scheme source code files are usually encoded in ASCII, but, the
+built-in reader can interpret other character encodings.  The
+procedure @code{primitive-load}, and by extension the functions that
+call it, such as @code{load}, first scan the top 500 characters of the
+file for a coding declaration.
+A coding declaration has the form @code{coding: XXXXXX}, where
address@hidden is the name of a character encoding in which the source
+code file has been encoded.  The coding declaration must appear in a
+scheme comment.  It can either be a semicolon-initiated comment or a block
address@hidden comment.
+The name of the character encoding in the coding declaration is
+typically lower case and containing only letters, numbers, and
+hyphens.  The most common examples of character encodings are
address@hidden and @code{iso-8859-1}.  This allows the coding
+declaration to be compatible with EMACS.
+For source code, only a subset of all possible character encodings can
+be interpreted by the built-in source code reader.  Only those
+character encodings in which ASCII text appears unmodified can be
+used.  This includes @code{UTF-8} and @code{ISO-8859-1} through
address@hidden  The multi-byte character encodings @code{UTF-16}
+and @code{UTF-32} may not be used because they are not compatible with
address@hidden read
address@hidden set-port-encoding!
+There might be a scenario in which one would want to read non-ASCII
+code from a port, such as with the function @code{read}, instead of
+with @code{load}.  If the port's character encoding is the same as the
+encoding of the code to be read by the port, not other special
+handling is necessary.  The port will automatically do the character
+encoding conversion.  The functions @code{setlocale} or by
address@hidden are used to set port encodings.
+If a port is used to read code of unknown character encoding, it can
+accomplish this in three steps.  First, the character encoding of the
+port should be set to ISO-8859-1 using @code{set-port-encoding!}.
+Then, the procedure @code{file-encoding}, described below, is used to
+scan for a coding declaration when reading from the port.  As a side
+effect, it rewinds the port after its scan is complete. After that,
+the port's character encoding should be set to the encoding returned
+by @code{file-encoding}, if any, again by using
address@hidden  Then the code can be read as normal.
address@hidden {Scheme Procedure} file-encoding port
address@hidden {C Function} scm_file_encoding port
+Scans the port for an EMACS-like character coding declaration near the
+top of the contents of a port with random-acessible contents.  The
+coding declaration is of the form @code{coding: XXXXX} and must appear
+in a scheme comment.
+Returns a string containing the character encoding of the file
+if a declaration was found, or @code{#f} otherwise.  The port is
address@hidden deffn
 @node Delayed Evaluation
 @subsection Delayed Evaluation
diff --git a/doc/ref/scheme-scripts.texi b/doc/ref/scheme-scripts.texi
index e12eee6..249bc34 100644
--- a/doc/ref/scheme-scripts.texi
+++ b/doc/ref/scheme-scripts.texi
@@ -64,6 +64,12 @@ operating system never reads this far, but Guile treats this 
as the end
 of the comment begun on the first line by the @samp{#!} characters.
+If this source code file is not ASCII or ISO-8859-1 encoded, a coding
+declaration such as @code{coding: utf-8} should appear in a comment
+somewhere in the first five lines of the file: see @ref{Character
+Encoding of Source Files}.
 The rest of the file should be a Scheme program.
 @end itemize

