[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Internal visibility

From: Han-Wen Nienhuys
Subject: Re: Internal visibility
Date: Wed, 11 Jun 2008 13:09:29 -0300

On Wed, Jun 11, 2008 at 4:24 AM, Clinton Ebadi <address@hidden> wrote:
>>> Strings in Guile will eventually be sequences of Unicode code points (as
>>> opposed to "bytes"), which can be represented in a variety of different
>>> ways (UTF-8, UCS-4, etc.).  How Guile represents strings and whether
>>> this representation "changes dynamically" (as you suggested) should not
>>> be exposed to the applications in order to leave as much freedom as
>>> possible to Guile's implementation strategy.
>> I think that a sequence of Unicode code points this is a somewhat
>> limited view of how strings should be used.  Among others, the
>> implication is that programs cannot rely on being able to index a
>> string in O(1) time (since the string might be UTF-x encoded).
>> What do I use if I want to have guaranteed O(1) indexing -that is- if
>> I want to manipulate strings of bytes?
>> How would I read the contents of a binary file without jumping through
>> encoding hoops?
> Uniform byte vectors. If you're using C you can just read everything
> into a normal C array and then use
> scm_take_u8_vector()/scm_u8vector_elements().

Are you serious? You want me to run regexes over uniform vectors?
concatenating uniform vectors? doing a scm_display and being able to
make sense of it?

What scares me of this idea of doing The Right Thing with unicode of me is that

Judging by the signature of the functions, the char* <-> string
conversion are thought to (in the future, at least) change their
behavior depending on the LC_LOCALE environment setting.  If I would
use strings rather than uniform vectors (which seems wise if I don't
want to reimplement half of guile)

* the performance of my software will be dependent on what users
happen to have in their LOCALE.  If I am unlucky, every string that
passes through the C interface will transcoded from and to UTF-x

* GUILE is thought to only support one locale at a time. No using
GUILE to transcode strings, for example.

Can we at least have a scm_to_locale_stringn() that takes an explicit
encoding/locale parameter, so that I can have some guarantee of how
GUILE is (not) munging my strings?

Han-Wen Nienhuys - address@hidden -

reply via email to

[Prev in Thread] Current Thread [Next in Thread]