[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coding tags and utf-16

From: Kenichi Handa
Subject: Re: coding tags and utf-16
Date: Tue, 07 Mar 2006 10:02:05 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI)

In article <address@hidden>, Benjamin Riefenstahl <address@hidden> writes:

> Kenichi Handa writes:
>> For decoding UTF-8, we should not delete that BOM but treat it as
>> the content of the text.  For UTF-16, Unicode explicitly says that
>> "The BOM is not considered part of the content of the text", but for
>> UTF-8, it doesn't say such a thing.

> NOTEPAD.EXE (the basic MS Windows editor) adds a BOM when writing
> UTF-8 files.  When I saw that and tried to discuss it on their
> newsgroups, I learned that it seems to be Microsoft's POV that this is
> a good thing.

> Which means files like that exist.  Treating the BOM as content means
> that U+FEFF creeps into the regular content of documents through
> cut-and-paste and through components of template systems.  I have
> already seen that happening in real life and of course it leads to
> stupid bugs.  I think Emacs should do better.

But, it's simply a bug to delete the leading U+FEFF from the
content while decoding utf-8.  Perhaps we should add some
customizable flag to control that behavior after the

>> utf-16-be [==] utf-16be-with-signature [!=] utf-16be

> ;-)


Kenichi Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]