emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: utf-16le vs utf-16-le


From: Stephen J. Turnbull
Subject: Re: utf-16le vs utf-16-le
Date: Wed, 16 Apr 2008 01:51:50 +0900

Eli Zaretskii writes:

 > > BOM-{prohibited,auto,required}.
 > 
 > But we don't have these in Emacs, do we?

Huh?  We don't have the full suite, but we do have -signature variants.

 > >  > Don't forget that en/decoding is used on strings as well, not only on
 > >  > buffers.  Buffer-local variables won't cut it, I think.
 > > 
 > > Strings don't have encoding signatures or newline variants
 > 
 > ??? Of course, they do.

Indeed?  Suppose I have a string as the value of the symbol `s'
containing the octets "\r\n".  Please explain to me how to compute
whether that is the value 0x0D0A from a network stream prepared using
htons(3), or a line ending suitable for appending to a Windows file.

As I wrote before:

 > > those octet sequences if present in a string are merely binary octet
 > > sequences.  They only have special semantics in external
 > > representations.  Where's the problem?
 > 
 > A string can be sent to a process, for example, so we must have some
 > way of generating an external representation for it.

Well, of course we must.  But the right generalization of "buffer file
coding system" is not to apply en/decoding to strings, but rather to
give processes and sockets, etc, coding system properties equivalent
to my proposed buffer-local variables.

All I'm trying to say here is that "prepend a signature" and
"translate ?\n to appropriate EOL representation" and their inverses
make sense independently of the text encoding[1], and that the user
interface and API could be greatly clarified if it reflected that
fact.  I suspect bugs like the one you encountered would be a lot less
frequent if the internal architecture reflected it too, but that might
be inefficient.

Footnotes: 
[1]  Obviously "prepend a signature" needs to be parametrized by the
encoding in general, but in the case of Unicode UTFs it's actually
independent of the UTF.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]