[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-smalltalk] [Q] Unicode String?

From: Chun Sungjin
Subject: Re: [Help-smalltalk] [Q] Unicode String?
Date: Fri, 7 Jul 2006 17:05:18 +0900


main problem is that for example, if I did create an instance of string like this;

a := 'Some MultiByte Encoded String'.


a size

does not answer correct length of string.

However, I will try what you said, thank you

On Jul 7, 2006, at 4:03 PM, Paolo Bonzini wrote:

Chun Sungjin wrote:

I've tried GNU smalltalk and for me it seems good. But I have a problem: current implementation does not support Unicode. It seems that it only supports single byte character only. I've also tried squeak, which seems less faster than GNU smalltalk - I'm not sure on this, this might not be correct - has unicode compatible string implementation and I think this kind of approach is good. Is there any change to have unicode compatible string implementation in next version of GNU smalltalk?
What do you need exactly? The main missing thing is support for Character objects with values above 256. However if you are content with multibyte character sets like UTF-8, or with Unicode character codes, that's fine.

For character set translation, if you load the I18N package, GNU Smalltalk gets an iconv wrapper. The main method you need is EncodedStream>>#on:from:to: (e.g. on: 'abc' from: 'UTF-8' to: 'UCS-4').

To extract Unicode character codes from an UCS-4LE encoded string, you can use (ByteStream on: x asByteArray) and send nextLong. For big-endian, there is no class but I was thinking of adding a #bigEndian method to ByteStream for the next version.

Things that could be useful are
   String class>>#utf8FromCodepoint: (same as above)
   UTF8Stream (returns Unicode character codes)
   ... (tell me what you need) ...


reply via email to

[Prev in Thread] Current Thread [Next in Thread]