bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: console plans


From: Niels Möller
Subject: Re: console plans
Date: 17 Feb 2002 00:13:17 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1

Anders Jackson <anders.jackson@minpost.nu> writes:

> > For the input part, the complexity hits whatever component it is that
> > converts unicode or utf8 to a local charset like latin1 (and given the
> > current level of support for utf8 in tools like emacs and TeX, I don't
> > think eightbit charsets will be abandoned very soon).
> 
> But do you need to get in ALL Unicode characters from the console in
> every locale?  I think Unicode for internal representation is a Good
> Thing.

The complexity I'm thinking of comes fromt he fact that several
latin-1 characters have *several* valid and equivalent representations
in unicode, and a unicode to latin-1 converter has to treat them *all*
correctly. That can be dealt with, but it's more complex than just
converting scancodes to characters in the user's character set.

And I feel *strongly* that doing unicode without getting normalization
and equivalence issues right is a very very bad thing, worse than the
current chaos of various 8-bit character sets. Incompatibilities
because one program or system uses iso-8859-1 and another uses
iso-8859-5 are well known, easy to understandd, and one can often tell
programs what character set to use. Incompatibilities because two
programs, both using unicode, require different normalization (and
thus have broken unicode conformance), e.g. one program insisting that
my last name is spelled with an "LATIN SMALL LETTER O WITH DIAERESIS"
and another with "COMBINING DIAERESIS", are harder both to understand
and work around. You will never see a program with options to
configure what particular normalization it should use for "ö", and
other characters with several equivalent unicode representations.

> But using X for telling how to get characters from keyboard scancodes
> to Unicode is compatible with using Unicode internaly.

Huh? I don't understand you. My point is that it is easier to convert
X keysyms to the user's choice of local character set (be that latin1
or utf8 or whatever) than to convert from unicode, because, as far as
I'm aware, X keysums have a simple one to one mapping between
characters and integers, without any of those equivalence rules which
you have to understand and implement in order to deal with unicode
properly.

> Add composing later.

Doing unicode sans composing characters may be a start, but it is
*not* really "unicode".

/Niels



reply via email to

[Prev in Thread] Current Thread [Next in Thread]