[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

From: Alex Shinn
Subject: Re: [Chicken-users] ditching syntax-case modules for the utf8 egg
Date: Tue, 18 Mar 2008 20:24:32 +0900
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1.50 (darwin)

>>>>> "Peter" == Peter Bex <address@hidden> writes:

    Peter> On Tue, Mar 18, 2008 at 11:41:08AM +0900, Alex Shinn wrote:
    >> >>>>> "Kon" == Kon Lovett <address@hidden>
    >> writes:
    Kon> Summary: I want a byte-string API. I want string
    Kon> integrations. I want global UTF8 strings.
    >> The only way this can happen is to push the UTF8
    >> handling into the core of Chicken itself.
    >> However it would be contrary to Chicken's goal of
    >> keeping a minimal core with extensions built on top.

    Peter> How much bigger would this make the core?  I
    Peter> really doubt it would have much of an impact, and
    Peter> it would sure make lots of things a lot simpler.

It's hard to say, but it would probably be about the size of
the utf8-lolevel egg (49k on my machine), since the rest is
redefinitions of existing procedures that wouldn't be any
more complex.

I'm not saying I recommend this, I'm just pointing out what
would need to happen for Kon's global utf8 semantics.

Although utf8 wins in many areas compared to other Unicode
representations, it's still more complex than ASCII.

If someone did seriously want to move this into the core and
Felix allowed it, you would want to do it in three phases:

  1) provide a full byte-string-level API (mostly
  BYTE-STRING-REF and BYTE-STRING-REF) - byte-strings are
  the same exact objects as utf8-strings, we just use
  different procedures

  2) locate any modules that treat strings as byte-strings
  and update them to use the new byte-string API

  3) replace the core string operations with utf8 versions

  4) replace SRFI-14 with the Unicode version (this requires
  the iset egg to be moved into the core Chicken
  distribution, though it needn't be loaded by default)

SRFI-13 makes very, very heavy use of string indices, so
that idiom happens to be slow with utf8 strings (which is a
different thing from saying utf8 is slow).  It would be best
to then provide a string-cursor based string library and
encourage its use instead of SRFI-13.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]