guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: i18n? unicode?


From: Simon Josefsson
Subject: Re: i18n? unicode?
Date: Wed, 13 Feb 2002 10:26:54 +0100 (CET)

On 13 Feb 2002, Alex Shinn wrote:

> >>>>> "Simon" == Simon Josefsson <address@hidden> writes:
> 
>     Simon> Is anyone working on Unicode and/or support for various other
>     Simon> encodings for guile strings?
> 
>     Simon> I guess this would be one major issue that needs to be done
>     Simon> before a guile emacs can happen.
> 
> Some work has been done off and on, but it's not a simple problem.

>From the little I have understood, I have understood this. :-)  That's why
I'd like to see it designed cleanly or at least documented, instead of the
(to me) confusing Emacs MULE stuff.

> One of the big catches is that Guile wants to both replace Emacs-Lisp
> and extend well with C.  For efficient multi-byte strings, Emacs-Lisp
> has its own string-representation, and the obvious idea would be to do
> likewise (probably using unicode instead of mule), but then you don't
> play well with C libraries and have to do conversions everywhere.

Is "automatic" conversions really necessary?  The "automatic"  (guessing)
logic of Emacs MULE seems to cause unexpected behaviour at some times.

> Another annoyance is that R5RS pretty clearly treats strings as
> character arrays, but many multibyte encodings are not arrays, so
> procedures like string-ref and string-set! become slow.

This is bad.  Isn't there any standardisation work going on to fix this?

> The only Scheme I know of that has decent multibyte support is Gauche,
> and that is at the expense of performance on string-ref and the like.
> To make up for this it provides string pointers to loop through strings.
> A C API for extensions would presumably need to do explicit conversions.

Seems like a hack...

> Bigloo has limited ucs2 support, but not really unified - you have to
> know what strings you're working with.

Internally I think this approach seems best -- if you don't know what
strings you're working with, you can't expect things to work.  Of course,
users can't be expected to know these things, but I don't see why users
need to concern themselves with the low-level interface..

> Kawa is implemented in Java, so has as good unicode support as Java.  
> But then you're tied to Java.

Yup, the unicode support for it is great.  But most other encodings, and 
unification between them, isn't great as far as I understand.

I think basing this on the character set stuff available in GNU libc and
iconv would make it behave like "other" applications, which is a good 
thing:

http://www.gnu.org/manual/glibc-2.2.3/html_chapter/libc_6.html

I'm not sure if the support is sufficient, but maybe it can be extended if
not.

> There are some preliminary charset conversion routines at
> 
>   http://synthcode.com/gumm/packages/a/ams/guile-charset-0.01.tar.gz

Thanks, I'll have a look at it.

> which only does uninteresting 8-bit conversions at the moment.  One
> potential idea of this, though, is to implement multi-byte string
> handling entirely in Scheme, and redefine basic string/port procedures
> using generic methods to handle different string types.  Kind of a hack
> (btw, this is how Perl5 does it) but could get people started writing
> multi-byte string apps and the upgrade (internal support for different
> strings in string procedures) means they won't have to change their
> code.

Yes... it would be kind of a hack.  I'll look into the GNU libc path for 
now.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]