[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Emacs 23 character code space
From: |
Eli Zaretskii |
Subject: |
Re: Emacs 23 character code space |
Date: |
Sat, 01 Nov 2008 18:46:09 +0200 |
Another fragment from etc/NEWS that seems not entirely accurate:
In buffers and strings, characters are represented by UTF-8 byte
sequences in a multibyte buffer/string.
But UTF-8 defines 1- to 4-byte sequences to represent each Unicode
codepoint, whereas this comment from character.h:
/* character code 1st byte byte sequence
-------------- -------- -------------
0-7F 00..7F 0xxxxxxx
80-7FF C2..DF 110xxxxx 10xxxxxx
800-FFFF E0..EF 1110xxxx 10xxxxxx 10xxxxxx
10000-1FFFFF F0..F7 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
200000-3FFF7F F8 11111000 1000xxxx 10xxxxxx 10xxxxxx
10xxxxxx
3FFF80-3FFFFF C0..C1 1100000x 10xxxxxx (for eight-bit-char)
400000-... invalid
invalid 1st byte 80..BF 10xxxxxx
F9..FF 11111xxx (xxx != 000)
*/
seems to tell that we use up to 5 bytes.
What am I missing?
- Emacs 23 character code space, Eli Zaretskii, 2008/11/01
- Re: Emacs 23 character code space,
Eli Zaretskii <=
- Re: Emacs 23 character code space, Kenichi Handa, 2008/11/02
- Re: Emacs 23 character code space, Kenichi Handa, 2008/11/03
- Re: Emacs 23 character code space, Eli Zaretskii, 2008/11/03
- Re: Emacs 23 character code space, Kenichi Handa, 2008/11/04
- Re: Emacs 23 character code space, Eli Zaretskii, 2008/11/04
- Re: Emacs 23 character code space, Kenichi Handa, 2008/11/05
- Re: Emacs 23 character code space, Eli Zaretskii, 2008/11/05
- Re: Emacs 23 character code space, Eli Zaretskii, 2008/11/22
- Re: Emacs 23 character code space, Kenichi Handa, 2008/11/25
- Re: Emacs 23 character code space, Eli Zaretskii, 2008/11/25