Re: Internal visibility

From: Thien-Thi Nguyen
Subject: Re: Internal visibility
Date: Wed, 11 Jun 2008 09:49:26 +0200
() address@hidden (Ludovic Courtès)
() Tue, 10 Jun 2008 14:09:33 +0200

   Currently, Guile only supports `scm_to_locale_string ()', which means
   the returned C string is encoded in the current locale's encoding.
   Eventually, new functions may be added: `scm_to_utf8_string ()', etc.
   This was Marius' original plan [0], and I think it remains valid.

Most plans are "valid" but not all plans are easy to live with.

I think the encoding of a string (or buffer or "character" array
(or subsequence thereof)) needs to be explicit; the encoding is
not purely "internal" and to treat it as such will require hoop-
jumping on both sides of the API.  (How encoding support is
implemented, on the other hand, is indeed an internal affair.)

This is from observation of how Emacs attained multibyte-ness.
Note: not just "how Emacs does it" but "how Emacs used to not do
it and through time eventually came to do it".

In PostgreSQL's multibyte support, the i/o can be tempered by
setting the "client encoding".  This can be changed cheaply (per
request).  Basing encoding on locale only is not fine-grained
enough; setting the locale can be expensive and cause unrelated

See also GNU libc support (info "(libc) Character Set Handling"),
which applies similar principles at a lower (library) level.

All these programs chose not to expose many conversion functions
in the programming interface.  Instead, they expose few functions,
each with an encoding parameter.  That is surely a cleaner design.


