[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

From: Graham Fawcett
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Tue, 18 Mar 2008 09:02:27 -0400

On Tue, Mar 18, 2008 at 7:05 AM, Alex Shinn <address@hidden> wrote:
> >>>>> "Tobia" == Tobia Conforto <address@hidden> writes:
>     Tobia> Graham Fawcett wrote:
>     >> Here's another thought. It seems to me that if we
>     >> were to represent strings as composite values, e.g. a
>     >> two-slot record whose first slot is an encoding (the
>     >> symbol 'utf8, or #f for 'byte' encoding), and whose
>     >> second slot contains the string data, then the
>     >> various string functions could dispatch on the type,
>     >> and there would be no need to monkey-patch core
>     >> string functions to get the desired semantics.
>     Tobia> This is more or less how other languages, such as
>     Tobia> Python, solved the issue.  Two kinds of strings,
>     Tobia> byte and unicode, and overloading a few string
>     Tobia> operations to have a slightly different meaning
>     Tobia> when called on either, computing byte length
>     Tobia> vs. character length.
>  I keep trying to say, this is *not* the issue! :)

And eventually I'll listen, I promise Alex. :-)

>  The entire problem revolves around adding Unicode support as
>  an option, without modifying the core.  *If* we allow
>  ourselves to modify the core, then there is no problem at
>  all, and we can just copy the utf8 egg code over the
>  existing string procedures, and add in some procedures for
>  byte-level access.

While I'm leery of hastily choosing utf8 as our core string
representation (I feel that this issue *does* involve internal
representation for that very reason, though admittedly not in the same
sense) I agree 100% that Unicode support should be core. And on that
note I'll stop muddying the waters and give my +1 to copying utf8 into
the core and adding a suite of byte- functions.

For what it's worth, I also think that GMP should be in the core, and
that no one, nowhere should be allowed to publish an egg with a
toplevel procedure named (format) in it. Mysterious toplevel
interactions between indirect dependents are the bane of good


reply via email to

[Prev in Thread] Current Thread [Next in Thread]