emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

utf-16le vs utf-16-le


From: Stephen J. Turnbull
Subject: utf-16le vs utf-16-le
Date: Mon, 14 Apr 2008 07:23:45 +0900

Eli Zaretskii writes:

 > These two encodings have confusingly similar names, but significantly
 > different semantics: one expects a BOM, the other does not.

 > I tripped over these when I tried to read debugging logs saved by
 > MS-Windows, which are in UTF-16 without a BOM: I used utf-16-le, which
 > swallowed the first character.  When I realized it was due to a BOM,
 > it took me reading of the doc strings of each encoding to find out
 > what I did wrong.

Are you saying it was eating non-BOM characters?  But that's clearly a
bug in the codec.  If it's going to expect a BOM, it should error if
it doesn't get one, not eat the character.

This business of having presence or absence of signatures determined
by coding systems has always felt wrong to me.  Signatures are
generally related to higher-level protocols (eg, XML mandates them for
UTF-16, while the MS logging facility de facto prohibits them).  So
whether a signature is used or not should be a buffer-local variable,
not a property of the coding system.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]