[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: term, utf-8 and cooked mode, combining characters

From: Niels Möller
Subject: Re: term, utf-8 and cooked mode, combining characters
Date: 18 Sep 2002 15:25:41 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de> writes:

> Nobody will use UTF-32 as their local encoding for the foreseeable
> future, right?

I really don't know. Right now, utf8 seeems almost as impractical as
utf-32 to me, and I don't know how that will change when more programs
pick up support for larger character sets.

> There is no advantage whatsoever to use the same encoding in the input and
> output half.  Both are completely separated.

They're the same program and the same binary, so at least it's less
code bloat to add unicode support to the second half than to the

Using unicode somewhere in the input path seems necessary, and if you
follow Rolands idea of putting more of term into the console server,
then the console server seems to be the right place. If you don't do
that, then I agree that the console need not know about it, and just pass
the utf8 stream on to term.

> I think all this legacy chinese/japanese/korean stuff, bu . Of
> course we could just hard code UTF-8 support in term (that's what a
> patch for the Linux kernel does), but that is kinda cheap. ;)

Special casing utf8 is a reasonable thing to do in almost all cases
(e.g., quoting the CLISP announcement

  * CLISP does not come with GNU libiconv anymore.  The most important
    encodings are built-in anyway, and CLISP can use the GNU libc 2.2
    iconv and a GNU libiconv when it is independently installed.

), the question is if it's reasonable to specialcase other multibyte
charsets. It seems that somebody (either term or the console) must know
the particulars, like mapping sequences to positions on the line, just
too bad if iconv doesn't support any help for doing that.

I'm assuming that each glyph/grapheme in the asian multibyte encodings
correspond to a single unicode base char plus zero or more combining
characters. If that's not true, then unicode awareness won't help you
sort them out.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]