[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: utf-16le vs utf-16-le

From: Stephen J. Turnbull
Subject: Re: utf-16le vs utf-16-le
Date: Tue, 15 Apr 2008 03:25:51 +0900

David Kastrup writes:
 > "Stephen J. Turnbull" <address@hidden> writes:

 > > I don't know, in fact I think I think [having BOM-specific coding
 > > systems is] a bad idea.  That's what the part of my message that
 > > you snipped was saying.  But I'll have to defer to Handa-san on
 > > that.
 > I think it obvious: if a BOM mark gets detected on read, one wants
 > to have it removed from the buffer and reinserted on saving the
 > buffer.

I agree, as you state it, it's obvious.  My question is "why does that
need to be part of the coding system?"  At present the UTF-16 and
UTF-32 Unicode coding systems (in the abstract) have *twenty-seven*
variants each (BOM-required, BOM-prohibited, BOM-autodetected X be,
le, system-dependent X CR, LF, CRLF), and UTF-8 needs *nine*.  This is
nuts, from a user-education standpoint.

What I proposed was a more generic concept where use of signatures and
the EOL convention would (at least to the user) appear as buffer-local

 > I am just not sure what the semantics for recoding/encoding/decoding
 > regions are.  They should not mess with BOM in any case, I would
 > suppose.  But then reading a file is not equivalent to reading it
 > literally in unibyte mode and then decoding the buffer-region.

That's correct.  The thing is, processing the BOM is a question of
*initialization* of a stream.

 > Maybe there never was such an equivalence (can't be for shift codes, can
 > it?).

In my view, there cannot be an equivalence.  An Emacs buffer in
unibyte mode is a *different* stream from the file it was read from,
and the decision about BOM processing will have to be made differently
from the way the decision is made at the time of reading from the
file.  You could add yet another option for BOM mode, namely "if this
stream is an Emacs buffer that is visting a file in unibyte mode, then
do BOM processing on conversion as if you were reading in the file in
multibyte mode."  I don't much like this....

reply via email to

[Prev in Thread] Current Thread [Next in Thread]