[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From: Stefan Monnier
Subject: Re: eight-bit char handling in emacs-unicode
Date: 17 Nov 2003 16:17:56 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

> The basic problem is that we don't distinguish a character
> (code) and a number.  So, we introduce a character object

That's one way to look at the problem.
Another is to say that the problem is instead that we do not distinguish
between arrays of chars and arrays of bytes.  We just use strings and
buffers and expect to be able to mix bytes and chars in them.

Such mixes are admittedly very rare for strings, but they're pretty common
for buffers.

So when we write 192 at a location, we don't know whether we should put
there the byte 192 or the eight-bit-char character that will be encoded
into a 192 byte.

In Emacs-21 we worked around the problem by arranging for "the
eight-bit-char that encodes to 192" to be represented by the integer 192, so
as to avoid having to choose.  But with unicode, the 128-255 zone cannot be
dedicated to eight-bit-char since it's already used up for latin-1, so we
have to face the problem more directly.

The places where Emacs-21 still had to choose, we just used heursitics,
so `concat' will sometimes return a unibyte string, and sometimes
multibyte string.

So I think your options 1-3 are better than 4.  BTW, your function
`eight-bit-char' should be named `byte-to-char' instead.

Which of 1 to 3 is the best is not clear, and maybe we can just live with
`make-string-unibyte' and `make-string-multibyte'.  Note that 1-3 are
not mutually exclusive so we can use them all.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]