emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unibyte characters


From: Eli Zaretskii
Subject: Unibyte characters
Date: Fri, 31 Oct 2008 13:05:54 +0200

The ELisp manual has (in node "Text Representation") this explanation
of what is a "unibyte character":

       In unibyte representation, each character occupies one byte and
    therefore the possible character codes range from 0 to 255.  Codes 0
    through 127 are ASCII characters; the codes from 128 through 255 are
    used for one non-ASCII character set [...]

But I think this is inaccurate and even misleading.  For starters,
unibyte buffers and strings can contain DBCS characters and UTF-8
encoded text, where a character certainly does not ``occupy one
byte''.

More generally, I think it is better to say that unibyte buffers and
strings hold raw 8-bit bytes, and that for 8859-x and single-byte
Windows codepages, each such byte represents a single character.

Am I missing something?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]