[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: utf-16le vs utf-16-le

From: Stefan Monnier
Subject: Re: utf-16le vs utf-16-le
Date: Mon, 14 Apr 2008 16:20:16 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux)

>> > I don't know, in fact I think I think [having BOM-specific coding
>> > systems is] a bad idea.  That's what the part of my message that
>> > you snipped was saying.  But I'll have to defer to Handa-san on
>> > that.
>> I think it obvious: if a BOM mark gets detected on read, one wants
>> to have it removed from the buffer and reinserted on saving the
>> buffer.

> I agree, as you state it, it's obvious.  My question is "why does that
> need to be part of the coding system?"  At present the UTF-16 and
> UTF-32 Unicode coding systems (in the abstract) have *twenty-seven*
> variants each (BOM-required, BOM-prohibited, BOM-autodetected X be,
> le, system-dependent X CR, LF, CRLF), and UTF-8 needs *nine*.  This is
> nuts, from a user-education standpoint.

For what it's worth, I do think it would make sense to try and move the
BOM-processing outside of the coding-system proper.  For me a good test
for coding-system-worthiness is "what if I use it for a process rather
than a file".  Based on this test, I'm not sure if BOMs really fit in
(other than for auto-detection and automatically stripping them, maybe).

> What I proposed was a more generic concept where use of signatures and
> the EOL convention would (at least to the user) appear as buffer-local
> variables.

Here, I disagree: EOL processing definitely need to take place when
talking to subprocesses, so EOL-handling doesn't belong in buffer-local
vars but in the coding-system.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]