classpath-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes


From: Archie Cobbs
Subject: Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes
Date: Wed, 17 Nov 2004 11:48:56 -0600 (CST)

Jeroen Frijters wrote:
> > > I committed the attached patch to remove the throwing of
> > > CharConversionException from the character encoders/decoders.
> > > 
> > > For encoders, unsupported characters are now always 
> > replaced with a '?'
> > > byte and for the UTF8 decoder, invalid UTF-8 bytes are replaced by a
> > > Unicode REPLACEMENT CHARACTER (\uFFFD) in the output stream.
> > 
> > Just curious.. does this implementation have the same problem as
> > described in 
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4628881 ?
> > I.e., is it a lossy encoding for "invalid" characters?
> 
> At the moment the UTF-8 encoder/decoder is fully symmetrical for all
> "characters" (really UTF-16 codepoints), but this is actually a bug, IMO
> unpaired surrogate pairs shouldn't be decoded (like the bug parade
> comment says, the test case is bogus).

This is arguable in my opinion. Does the UTF-8 specification say that
only currently defined Unicode characters may be encoded/decoded?

What about Java class files? They contain arbitrary 16 byte characters
encoded using "UTF-8" .. by your logic, isn't that a violation? Etc.

I guess it depends on whether UTF-8 is defined as a 16 byte value
encoding or a Unicode character encoding.. but even if it's defined
as the latter, in practice, it is certainly used as the former a lot...

-Archie

__________________________________________________________________________
Archie Cobbs      *        CTO, Awarix        *      http://www.awarix.com


*
Confidentiality Notice: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.
*





reply via email to

[Prev in Thread] Current Thread [Next in Thread]