[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-users] Re: unicode and chicken
From: |
Joerg F. Wittenberger |
Subject: |
[Chicken-users] Re: unicode and chicken |
Date: |
14 Nov 2002 12:57:13 +0100 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp) |
Felix Winkelmann <address@hidden> writes:
> Joerg F. Wittenberger wrote:
> > I'd really like to see/help get something going into that direction.
>
> > Unicode is of major importance and quite an angument to switch towards
> > Java (argh).
> > I guess a generally unicode based chicken would be a bit
>
> > slower... What penealty would you expect?
>
> Too heavy a penalty. Instead I propose the following instead:
>
> - Extending the character handling to allow 16-bit character
> codes. Generally the effect of using non-Latin1 characters
> with standard procedures is undefined.
>
> - A new library unit `unicode':
>
> A new data type `ucs-2-string', which stores a ucs-2
> representation in native byte order.
>
> procedures:
>
> ucs-2-string
> ucs-2-string-append
> make-ucs-2-string
> ucs-2-string->list
> list->ucs-2-string
> ucs-2-string-ref
> ucs-2-string-set!
>
> read-syntaxes:
>
> #utf-8"..."
important, but:
remark: Gauche does single character reding like that and I remember
bigloo beeing similar (?). Prefer copatible way.
#\u0041 => #\A ; ASCII letter 'A', specified by UCS
#\u3042 => ; Hiragana letter A, specified by UCS
#\u0002a6b2 => ; JISX0213 Kanji 2-94-86, specified by UCS4
> #lsb-ucs-2"..."
> #msb-ucs-2"..."
> #ucs-2"..." (native byte order)
these can wait.
I'd propose to keep some compatibility here. Who adds knowledge about
other Scheme implementations? Bigloo has:
(from
http://www-sop.inria.fr/mimosa/personnel/Manuel.Serrano/bigloo/doc/bigloo-5.1.html#container1412)
ucs2? ucs2=? ucs2<? ucs2>? ucs2<=? ucs2>=? ucs2-ci=? ucs2-ci<?
ucs2-ci>? ucs2-ci<=? ucs2-ci>=?
ucs2-alphabetic? ucs2-numeric? ucs2-whitespace? ucs2-upper-case?
ucs2-lower-case?
ucs2->integer integer->ucs2
ucs2-string? make-ucs2-string ucs2-string
ucs2-string-length ucs2-string-ref ucs2-string-set!
ucs2-string=? ucs2-string-ci=? ucs2-string<? ucs2-string>?
ucs2-string<=? ucs2-string>=? ucs2-string-ci<? ucs2-string-ci>?
ucs2-string-ci<=? ucs2-string-ci>=?
subucs2-string ucs2-string-append ucs2-string->list list->ucs2-string
ucs2-string-copy
ucs2-string->utf8-string utf8-string->ucs2-string
> A parameter `print-unicode-as-utf-8' (defaults to ?)
> which controls printing (either as ucs-2 literal or as utf-8)
Defaults to UTF-8. Maybe expose low level routines to print UCS-2
strings to ports like display-utf8 and display-ucs2, which don't look
after the parameter.
> - Adapting SRFI-13 and the (remaining) string-routines in `extras'
> to handle ucs-2-strings.
>
> - Adding the ucs-2 character set to SRFI-14
>
> Any comments are welcome.
I'm with you so far.
Maybe for a start the whole code could be "stolen" from Gauche
(it's also under a BSD license) incorporated and let's see how much of
a penealty it actually is... optimize later.
so short
/Jörg
--
The worst of harm may often result from the best of intentions.