[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coding tags and utf-16

From: Benjamin Riefenstahl
Subject: Re: coding tags and utf-16
Date: Mon, 06 Mar 2006 20:35:15 +0100
User-agent: Gnus/5.1001 (Gnus v5.10.1) Emacs/21.3.50 (gnu/linux)


Kenichi Handa writes:
> For decoding UTF-8, we should not delete that BOM but treat it as
> the content of the text.  For UTF-16, Unicode explicitly says that
> "The BOM is not considered part of the content of the text", but for
> UTF-8, it doesn't say such a thing.

NOTEPAD.EXE (the basic MS Windows editor) adds a BOM when writing
UTF-8 files.  When I saw that and tried to discuss it on their
newsgroups, I learned that it seems to be Microsoft's POV that this is
a good thing.

Which means files like that exist.  Treating the BOM as content means
that U+FEFF creeps into the regular content of documents through
cut-and-paste and through components of template systems.  I have
already seen that happening in real life and of course it leads to
stupid bugs.  I think Emacs should do better.

> utf-16-be [==] utf-16be-with-signature [!=] utf-16be



reply via email to

[Prev in Thread] Current Thread [Next in Thread]