[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: inputting characters by hexadigit

From: Kenichi Handa
Subject: Re: inputting characters by hexadigit
Date: Sun, 20 Jul 2008 10:23:57 +0900

In article <address@hidden>, Juri Linkov <address@hidden> writes:

> > I think it is better to skip these ranges:
> >   #x3400..#x4dbf   -- CJK Ideograph Extension A
> >   #x4e00..#x9fff   -- CJK Ideograph
> >   #xd800..#xfaFF   -- surroage-pair, private use, CJK COMPATIBILITY 
> >   #x20000..#x2ffff -- CJK Ideograph Extension B
> > and end the loop at #xeffff (#xf0000.. are for private use)

> Actually there are no Unicode names in these ranges in UnicodeData.txt.
> It has only lines for the first and the last character in these ranges:

Yes.  But, for CJK chars:

   (get-char-code-property CHAR 'name)

returns a valid name something like "CJK IDEOGRAPH-3400"(*)
because get-char-code-property not only looks up
UnicodeData.txt but also compute a proper value if

> If it would be possible to loop over names instead of loop over all
> characters to check for their names, then this code would be more fast,
> but I don't see how it would be possible to loop over all defined names
> in UnicodeData.txt.

> If this is not possible then we could optimize the loop over all
> characters in the chartable to skip these useless ranges.

I think it doesn't work because Hangul syllabic character
names must also be computed algorithmically(*).   I think
just doing somethink like this is good:

 (dotimes (c #xEFFFF)
    (unless (CHAR-IS-IN-A-RANGE-TO-SKIP-P c)

(*): "The Unicode Standard 5.1" has this section.

4.8 Name—Normative
Ideographs and Hangul Syllables. Names for ideographs and
Hangul syllables are derived algorithmically. Unified CJK
ideographs are named CJK UNIFIED IDEOGRAPH-x, where x is
replaced with the hexadecimal Unicode code point—for
example, cjk unified ideograph-4E00. Similarly,
compatibility CJK ideographs are named “CJK COMPATIBILITY
IDEOGRAPH-x”. The names of Hangul syllables are generated as
described in “Hangul Syllable Names” in Section 3.12,
Conjoining Jamo Behavior.

Kenichi Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]