[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: generation of gnu/java/locale/*.uni

From: Brian Jones
Subject: Re: generation of gnu/java/locale/*.uni
Date: 17 Feb 2002 10:16:44 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1

Eric Blake <address@hidden> writes:

> Brian Jones wrote:
> > 
> > As I recall Unicode now requires more bits than a Java 'char' allows.
> > I don't know that helps at all?  I don't really know what Sun's
> > solution is.  It looks like we did update to unicode data 3.0, but I
> > know our implementation fails many Mauve tests related to Character.
> Unicode 3.1 introduced several code points in the surrogate space.  And
> the upcoming 3.2 adds even more.  These characters require two 16-bit
> fields to represent them (the first in \ud800 - \udb7f, the second in
> \udc00 - \udfff).  And Java does ignore these - the 4-byte abbreviation
> sequences of UTF-8 are illegal in class files (you have to use a 6-byte
> sequence instead), and Java identifiers may not include surrogate
> characters.  Sun would need to add more methods to the API to use them,
> because the point of surrogates is that two characters together have
> semantic meaning, while one alone is an error.  For example, it is
> impossible to tell if \ud820 in isolation is part of a letter, number,
> or punctuation.  So for now, Sun's "solution" is to stall.  I did verify
> today that JDK 1.4 is still on Unicode 3.0.0.
> The implementation of Character that I just checked in to Classpath is
> identical in behavior to Sun's (fortunately, testing every method on all
> 64k chars is not terribly time-consuming).  However, I could not run it
> through Mauve; as I still have been unable to compile a free VM on
> cygwin, and Sun's VM doesn't like me replacing core classes like
> Character.  But if Character fails any tests in Mauve now, then I would
> suspect that Mauve has the bugs.

I'll run what you've checked in through Mauve here and see what
happens.  Do you have time to evaluate the Character implementation
Artur pointed to?  I'm mostly concerned with correctness, I think the
one he pointed to improved efficiency, if not speed.  I'd do this
myself but that would involve time learning how Character/Unicode work.

Brian Jones <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]