[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I18N/M17N?

From: Masao Uebayashi
Subject: Re: I18N/M17N?
Date: Sat, 12 May 2001 01:34:00 +0900

Hello list!

[ Note: I'm quite sure that I'm not one at all who should reply such a
  important issue.  You all would not believe this sentences, and
  refer to proper documents (e.g. Ken Lunde's "CJKV Information
  Processing" from O'Reilly) ]


> One nice side effect of that encoding is that one can use different
> escape sequences to select different character sets for stretches of
> two-byte-per-character text.  For example, "\033$B" selects the
> JIS-X-0208-1983 character set (most commonly-used Japanese
> characters), while "\033$(A" selects GB 2312-80, a Chinese character
> set.  The whole arrangement is inherently multilingual --- you can
> drop in any characters you like simply by inventing new escape
> sequences.

> So, essentially, this means that all Japanese programmers are
> accustomed to having text indicate not only the characters, but also
> the *language* those characters represent.  In particular, they feel
> it is important that the encoding distinguish between Chinese text and
> Japanese text.  Now, they all agree that Chinese and Japanese use the
> same characters.  When speaking in English, Japanese programmers refer
> to the characters they use in their own names and in everyday writing
> as "Chinese characters".  (In fact, I think the Japanese word "Kanji"
> actually means "Chinese characters" --- but I am very unreliable on
> questions like that.)  A friend of mine in Kyoto compared the Japanese
> vs. Chinese situation to the French vs. English situation: certainly
> the English word "car" and the French word "car" ("because") are
> different words, but everyone agrees they're the same three letters.

We call "Kanji" as "Kanji" in our life.  We use the word "Chinese
Character" because it's more understandable for English speakers.  The
Japanese word "Kanji" can be separated into two parts: "Kan-ji", "Kan"
(originally "Han") is (was) the biggest tribe of China, which had been
kept a state. "ji" means character(s).

I doubt that the line "Now, they all agree that Chinese and Japanese
use the same characters."  All Japanese agree that Kanji came from
Chinese. It's a fact.  After importing them, Japanese have been logn
used, changed, created, composed, etc. that is, we developed it. What
extent characters are same of differ are of interpretation.  Some
famous, authoritative dictionaries are often resorted to for such
tasks (identification).

Unicode does the job of this kind. If you have the CJKV book, see
Figure 3-1 at p125 seems very instructive.

> The encoding currently used in GNU Emacs preserves the language
> distinctions.  Each character encodes a character set, and a point
> within that character set.  They use a nicer encoding than the one
> described above (for example, it's stateless), but it's basically just
> a better representation for the same information.

But Mule preserves major ISO-2022 encodings officially in its


(I refrain to comment about UTF-* because I'm so ignorant with it. I'm
going to read some papers on it.)

> So, the plan was to use the Emacs / MULE encoding in Guile until Emacs
> itself switches to this UTF-8-that-distinguishes-Chinese-and-Japanese,
> at which point Guile would switch too.

Then who can decide how the plan _is_? :-)


Masao Uebayashi <address@hidden>    --- We like raw eggs.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]