[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12291: [rev 109796] wrong UTF-8 handling

From: Werner LEMBERG
Subject: bug#12291: [rev 109796] wrong UTF-8 handling
Date: Tue, 28 Aug 2012 21:22:26 +0200 (CEST)

> In both cases, user surely see them.

OK.  BTW, the real use-case is a bug in emacs 23.x which prevented
correct conversion from emacs-mule encoding to utf-8, creating such
funnily encoded utf-8 files (I can't repeat this problem with my
recently compiled emacs, so it seems that it has been fixed

>> Instead, such characters must be converted to correct
>> UTF-8.
> ??? I don't understand what you means by "correct UTF-8".

Sorry, I've meant correct Unicode.  U+1351DE is larger than the
largest valid Unicode value.  As my example demonstrates, the Chinese
character in the file is certainly *neither* a private character nor a
character from GB 18030, so it should be converted to a regular
Unicode value.

> I think the correct behaviour on reading such a file by utf-8 is to
> treat each byte as raw-byte.

Maybe.  I'm not sure how Emacs should behave in reading such files.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]