Re: eight-bit char handling in emacs-unicode

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: eight-bit char handling in emacs-unicode

From:	Kenichi Handa
Subject:	Re: eight-bit char handling in emacs-unicode
Date:	Tue, 18 Nov 2003 16:33:15 +0900 (JST)
User-agent:	SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)

In article <jwvhe12emr3.fsf-monnier+emacs/address@hidden>, Stefan Monnier 
<address@hidden> writes:
>>  The basic problem is that we don't distinguish a character
>>  (code) and a number.  So, we introduce a character object

> That's one way to look at the problem.
> Another is to say that the problem is instead that we do not distinguish
> between arrays of chars and arrays of bytes.

I agree that it's possible to grasp the problem in that way,
but I'm not sure which is the better way.  Could you explain
WHY yours is better?

[...]
> In Emacs-21 we worked around the problem by arranging for "the
> eight-bit-char that encodes to 192" to be represented by the integer 192, so
> as to avoid having to choose.  But with unicode, the 128-255 zone cannot be
> dedicated to eight-bit-char since it's already used up for latin-1, so we
> have to face the problem more directly.

> The places where Emacs-21 still had to choose, we just used heursitics,
> so `concat' will sometimes return a unibyte string, and sometimes
> multibyte string.

> So I think your options 1-3 are better than 4.  BTW, your function
> `eight-bit-char' should be named `byte-to-char' instead.

> Which of 1 to 3 is the best is not clear, and maybe we can just live with
> `make-string-unibyte' and `make-string-multibyte'.

I think you mean string-make-unibyte/multibyte, but, for the
current problem, we can't use it because string-make-unibyte
may behave differently in different language environment.
Such a lang. env. that makes iso-8859-1 or Unicode the
highest priority for the character `À' is ok.

(string-make-unibyte (concat '(?a 192))) = "a\300"

But, if some lang. env. prefers such a charset for `À' that
encodes it not to 192 (e.g. Vietnamese VSCII), we fail.

> Note that 1-3 are not mutually exclusive so we can use
> them all.

Yes, but, at least, I really want to avoid "(3) Make a
series of new functions".

---
Ken'ichi HANDA
address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: BIG5-HKSCS?, (continued)

Prev by Date: Re: HTML as info format
Next by Date: Re: Patch for network-interface-{list,info}
Previous by thread: Re: eight-bit char handling in emacs-unicode
Next by thread: Re: eight-bit char handling in emacs-unicode
Index(es):
- Date
- Thread