[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes

From: Jeroen Frijters
Subject: RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes
Date: Wed, 17 Nov 2004 16:05:50 +0100

Archie Cobbs wrote:
> Jeroen Frijters wrote:
> > I committed the attached patch to remove the throwing of
> > CharConversionException from the character encoders/decoders.
> > 
> > For encoders, unsupported characters are now always 
> replaced with a '?'
> > byte and for the UTF8 decoder, invalid UTF-8 bytes are replaced by a
> > Unicode REPLACEMENT CHARACTER (\uFFFD) in the output stream.
> Just curious.. does this implementation have the same problem as
> described in 
> ?
> I.e., is it a lossy encoding for "invalid" characters?

At the moment the UTF-8 encoder/decoder is fully symmetrical for all
"characters" (really UTF-16 codepoints), but this is actually a bug, IMO
unpaired surrogate pairs shouldn't be decoded (like the bug parade
comment says, the test case is bogus).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]