[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces
From: |
Miles Bader |
Subject: |
[Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...) |
Date: |
22 Jan 2004 11:19:19 +0900 |
Tom Lord <address@hidden> writes:
> My personal opinion is that the Unicode consortium is probably right.
> While I can't personally evaluate the CJK issue based on my own
> knowledge, in those areas (both linguistic and computational) where I
> _am_ qualified to judge their arguments and decisions -- they are
> unfailingly wise.
It's at best rumour, but I have heard that a major impetus behind han
unification was to save space (in a 16-bit encoding), not `correctness'.
My personal test is the `README test': I'd like `cat README' to always
yield something appropriate even on a dumb terminal -- even if the README
file is part of a Chinese package, and I'm reading it on my American
computer (say at a university where the computer systems have to cater to a
very diverse audience).
As far as I know, basic Unicode doesn't do this correctly for CJK, though
it apparently does for other character sets.
The problem, as I understand it, is that although all these characters have
a shared history, and in many cases are in fact exactly the same character
(with maybe a very slight difference in detailing), some have diverged
quite a bit in appearance, to the point where they are unrecognizable if
displayed in the wrong `font'. In such a case are they the same character?
I dunno; for some usages that makes a lot of sense, for others, it doesn't.
I'll bet they could have done quite nicely with a sort of 90% unification:
unify everything that looks pretty much the same (a lot), and keep separate
code-points for stuff that has changed dramatically. You'd still get
complaints of course, but at least a baseline of `always readable' would be
better met.
I seem to recall that there are somewhat kludgey additions to make things
work; I forget the specifics, but you can basically embed a sort of string
specifying the name of the font or locale or whatever, using appropriate
weird high bits in the characters of the name). I don't know how widely
this is supported by Unicode applications though.
-Miles
--
Saa, shall we dance? (from a dance-class advertisement)
- Re: [Gnu-arch-users] Spaces in filenames ... will come soon!, (continued)
- Re: [Gnu-arch-users] Spaces in filenames ... will come soon!, Eric W. Biederman, 2004/01/20
- [Gnu-arch-users] Re: Spaces in filenames ... will come soon!, Miles Bader, 2004/01/20
- [Gnu-arch-users] Re: Spaces in filenames ... will come soon!, Eric W. Biederman, 2004/01/21
- Re: [Gnu-arch-users] Re: Spaces in filenames ... will come soon!, Andrew Suffield, 2004/01/21
- [Gnu-arch-users] [semi-OT] Unicode / han unification (was Re: Spaces ...), Tom Lord, 2004/01/21
- Re: [Gnu-arch-users] [semi-OT] Unicode / han unification (was Re: Spaces ...), David Brown, 2004/01/21
- Re: [Gnu-arch-users] [semi-OT] Unicode / han unification (was Re: Spaces ...), Tom Lord, 2004/01/21
- [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Andrew Suffield, 2004/01/21
- Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Tom Lord, 2004/01/21
- Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Andrew Suffield, 2004/01/21
- [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...),
Miles Bader <=
- [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Tom Lord, 2004/01/21
- [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Miles Bader, 2004/01/21
- [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Tom Lord, 2004/01/22
- [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Miles Bader, 2004/01/22
- Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification, Brian May, 2004/01/22
- Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Florian Weimer, 2004/01/25
- Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...), Miles Bader, 2004/01/25
Re: [Gnu-arch-users] Spaces in filenames ... will come soon!, Robert Anderson, 2004/01/15