[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fwd: Re: Inadequate documentation of silly characters on screen.
From: |
David Kastrup |
Subject: |
Re: Fwd: Re: Inadequate documentation of silly characters on screen. |
Date: |
Thu, 19 Nov 2009 17:55:10 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) |
Alan Mackenzie <address@hidden> writes:
> On Thu, Nov 19, 2009 at 10:30:18AM -0500, Stefan Monnier wrote:
>> > The actual character in the string is ñ (#x3f).
>
>> No: the string does not contain any characters, only bytes, because
>> it's a unibyte string.
>
> I'm thinking from the lisp viewpoint. The string is a data structure
> which contains characters. I really don't want to have to think about
> the difference between "chars" and "bytes" when I'm hacking lisp. If
> I do, then the abstraction "string" is broken.
>
>> So it contains the byte 241, not the character ñ.
>
> That is then a bug. I wrote "(aset nl 0 ?ñ)", not "(aset nl 0 241)".
Huh? ?ñ is the Emacs code point of ñ. Which is pretty much identical
to the Unicode code point in Emacs 23.
>> The byte 241 can be inserted in multibyte strings and buffers because
>> it is also a char of code 4194289 (which gets displayed as \361).
>
> Hang on a mo'! How can the byte 241 "be" a char of code 4194289?
> This is some strange usage of the word "be" that I wasn't previously
> aware of. ;-)
Emacs encodes most of its things in utf-8. A Unicode code point is an
integer. You can encode it in different encodings, resulting in
different byte streams. Inside of a byte stream encoded in utf-8, the
isolated byte 241 does not correspond to a Unicode character. It is not
valid utf-8. When Emacs reads a file supposedly in utf-8, it wants to
represent _all_ possible byte streams in order to be able to save
unchanged data unmolested.
So it encodes the entity "illegal isolated byte 241 in an utf-8
document" with the character code 4194289 which has a representation in
Emacs' internal variant of utf-8, but is outside of the range of
Unicode.
> At this point, would you please just agree with me that when I do
>
> (setq nl "\n")
> (aset nl 0 ?ñ)
> (insert nl)
>
> , what should appear on the screen should be "ñ", NOT "\361"? Thanks!
You assume that ?ñ is a character. But in Emacs, it is an integer, a
Unicode code point in Emacs 23. As long as there is something like a
unibyte string, there is no way to distinguish the character 241 and the
byte 241 except when Emacs is told explicitly.
Because Emacs has no separate "character" data type.
--
David Kastrup
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., (continued)
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Alan Mackenzie, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Andreas Schwab, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Aidan Kehoe, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Alan Mackenzie, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Aidan Kehoe, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Stephen J. Turnbull, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Eli Zaretskii, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Eli Zaretskii, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Stefan Monnier, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Stephen J. Turnbull, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen.,
David Kastrup <=
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Alan Mackenzie, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Davis Herring, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Alan Mackenzie, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., David Kastrup, 2009/11/19
- Re: Fwd: Re: Inadequate documentation of silly characters on screen., Richard Stallman, 2009/11/21
- Displaying bytes (was: Inadequate documentation of silly characters on screen.), Stefan Monnier, 2009/11/22
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Richard Stallman, 2009/11/23
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Per Starbäck, 2009/11/23
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Richard Stallman, 2009/11/24
- Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.), Kenichi Handa, 2009/11/24