[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From: Stefan Monnier
Subject: Re: eight-bit char handling in emacs-unicode
Date: 21 Nov 2003 00:27:42 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

>> I thought that string-make-unibyte only behaves meaningfully for
>> "normal 8bit coding-systems" such as latin-1.

> Yes, but it doesn't mean it is conceptually the same as
> encode-coding-string.  The result of string-make-unibyte
> should still be regarded as a sequence of character, but the
> result of encode-coding-string is a sequence of byte.

Why/when is the distinction meaningful (given the fact that it
can only be used meaningfully with 8bit coding-systems where the
distinction seems more philosophical than anything else) ?

> Here exists an ambiguity of a unibyte string.

> The number 192 can be regarded as:
> (1) just a number, a byte
> (2) a code point of some character set.
> (3) a character code

But the second case is only possible for 8bit character sets, right?

Until now, I always thought that Emacs only dealt with
- byte streams representing encoded sequences of code points: case 1.
- sequences of internal character codes (internally encoded in emacs-mule
  or unicode depending on the branch you use): case 3.
Is there any place where we deal with sequences of code points of external
charsets really (other than in the degenerate case where such a sequence
is indistinguishable from case 1, maybe).

> A unibyte string can contain (1) and (2) without
> distinguishing them, but a multibyte string can contain (1)
> and (3) while distinguishing them.

Can multibyte strings distinguish the cases (1) and (3) for integer 97 and
character `a' ?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]