[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Unibyte characters
From: |
Eli Zaretskii |
Subject: |
Unibyte characters |
Date: |
Fri, 31 Oct 2008 13:05:54 +0200 |
The ELisp manual has (in node "Text Representation") this explanation
of what is a "unibyte character":
In unibyte representation, each character occupies one byte and
therefore the possible character codes range from 0 to 255. Codes 0
through 127 are ASCII characters; the codes from 128 through 255 are
used for one non-ASCII character set [...]
But I think this is inaccurate and even misleading. For starters,
unibyte buffers and strings can contain DBCS characters and UTF-8
encoded text, where a character certainly does not ``occupy one
byte''.
More generally, I think it is better to say that unibyte buffers and
strings hold raw 8-bit bytes, and that for 8859-x and single-byte
Windows codepages, each such byte represents a single character.
Am I missing something?
- Unibyte characters,
Eli Zaretskii <=
Re: Unibyte characters, Richard M. Stallman, 2008/10/31