[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode, ports and encoding

From: Mike Gran
Subject: Unicode, ports and encoding
Date: Mon, 16 Feb 2009 15:51:33 -0800 (PST)

More observations about wide strings and Guile.

First, here are the abridged call trees for low-level reading and

read <-+- scm_getc <-+- [the parser] <--- scm_read <--- scm_primitive_load
       |             |
       |             +- scm_read_char
       +- scm_c_read
       +- read_without_guile

write <-+- scm_lfwrite <-+- scm_display
        |                |
        |                +- scm_putc <-+- scm_write_char
        |                              |
        |                              +- scm_newline
        +- scm_flush

1.  To move to a Unicode-enabled guile, text information needs to be
    converted to an internal representation when read and converted
    back to the locale when written.  Most reading and writing for
    ports passes through scm_getc (input) and scm_lfwrite (output).
    Conversion between locale strings and internal strings should
    happen there.

2.  If string conversion occurs in scm_getc, then the scm_read reader
    will be receiving and parsing source code that has passed through
    the conversion routines.  This is initially not a problem since
    scheme code is largely ASCII, and Guile will start up in the C

    But, if a source code file is not ASCII, the reader needs to be
    able to ascertain this before parsing the code from the file.  The
    encoding of a source code file is a property of the file and not
    the locale in which Guile is being run. 

    This implies that a source code file should have syntax to
    indicate its own encoding, if it is not ASCII.  Something akin to
    the <?xml encoding="utf-8"?> line in HTML files.

3.  The text encoding of a port needs to be associated with the port.
    R6RS has the idea of transcoders for ports that require
    conversion.  It is daunting, but, having played some ideas for a
    few weeks, it seems that at least a subset of the transcoder
    functionality needs to be implemented for this to make any sense.

I sent in my copyright assignment last week, so you should have it


Mike Gran

reply via email to

[Prev in Thread] Current Thread [Next in Thread]