[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why string should be collection of single byte characters? (WAS: Re: [He

From: Sungjin Chun
Subject: Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)
Date: Sat, 8 Jul 2006 00:14:38 +0900


For me, string should not be limited to collection of single byte
characters. String is string not a simple collection of byte, isn't it? I
think squeak's approach (or OpenStep's approach, where abstract public
string class and concrete private subclasses of string that implements
several cases of string). But I'm not currently working hard on GNU
Smalltalk, this may not be the best idea for GNU Smalltalk's case :-)

I DO think that strlen is not for unicode(actually multi-byte encoded case)
string and is bad design: limited to single byte encoding. I DO think that
modern language should consider unicode like string. I DO think Smalltalk is

----- Original Message ----- 
From: "Paolo Bonzini" <address@hidden>
To: "Chun Sungjin" <address@hidden>
Cc: "GNU Smalltalk" <address@hidden>
Sent: Friday, July 07, 2006 6:17 PM
Subject: Re: {Spam?} Re: [Help-smalltalk] [Q] Unicode String?

> Chun Sungjin wrote:
> > Hi,
> >
> > main problem is that for example, if I did create an instance of
> > string like this;
> >
> > a := 'Some MultiByte Encoded String'.
> >
> > then
> >
> > a size
> >
> > does not answer correct length of string.
> Well, strlen does not in C, too.  You need mbrlen, and #size is more
> like strlen than mbrlen.
> Also, the result heavily depends on the chosen character set.  If we
> want to have #utf8Size, that's fine.  But #size should be the number of
> *bytes*, not of characters.
> I'm seeing now if I can add an EncodedStream method that extracts
> Unicode characters.  Then what you wanted would be something like
>     (EncodedStream wordsOn: 'some string') contents size
> for which, of course, we can add a utility method.
> Paolo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]