[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: inputting characters by hexadigit

From: Juri Linkov
Subject: Re: inputting characters by hexadigit
Date: Sun, 20 Jul 2008 03:29:14 +0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (x86_64-pc-linux-gnu)

>> + (defun read-char-by-name (prompt)
>> +   "Read a character by its Unicode name or hex number string.
>> + Display PROMPT and read a string that represent a character
>> + by its Unicode property `name' or `old-name'.  It also accepts
>> + a hexadecimal number of Unicode code point.  Returns a character
>> + as a number."
>> +   (let (name names)
>> +     (dotimes (c #x10FFFF)
>> +       (if (setq name (get-char-code-property c 'name))
>> +      (setq names (cons (cons name c) names)))
>> +       (if (setq name (get-char-code-property c 'old-name))
>> +      (setq names (cons (cons name c) names))))
>> +     (or (cdr (assoc (setq name (completing-read prompt names)) names))
>> +    (string-to-number name 16))))
>> +
> I think it is better to skip these ranges:
>   #x3400..#x4dbf   -- CJK Ideograph Extension A
>   #x4e00..#x9fff   -- CJK Ideograph
>   #xd800..#xfaFF   -- surroage-pair, private use, CJK COMPATIBILITY IDEOGRAPH
>   #x20000..#x2ffff -- CJK Ideograph Extension B
> and end the loop at #xeffff (#xf0000.. are for private use)

Actually there are no Unicode names in these ranges in UnicodeData.txt.
It has only lines for the first and the last character in these ranges:

3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FC3;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
D800;<Non Private Use High Surrogate, First>;Cs;0;L;;;;;N;;;;;
DB7F;<Non Private Use High Surrogate, Last>;Cs;0;L;;;;;N;;;;;
DB80;<Private Use High Surrogate, First>;Cs;0;L;;;;;N;;;;;
DBFF;<Private Use High Surrogate, Last>;Cs;0;L;;;;;N;;;;;
DC00;<Low Surrogate, First>;Cs;0;L;;;;;N;;;;;
DFFF;<Low Surrogate, Last>;Cs;0;L;;;;;N;;;;;
E000;<Private Use, First>;Co;0;L;;;;;N;;;;;
F8FF;<Private Use, Last>;Co;0;L;;;;;N;;;;;
20000;<CJK Ideograph Extension B, First>;Lo;0;L;;;;;N;;;;;
2A6D6;<CJK Ideograph Extension B, Last>;Lo;0;L;;;;;N;;;;;
F0000;<Plane 15 Private Use, First>;Co;0;L;;;;;N;;;;;
FFFFD;<Plane 15 Private Use, Last>;Co;0;L;;;;;N;;;;;
100000;<Plane 16 Private Use, First>;Co;0;L;;;;;N;;;;;
10FFFD;<Plane 16 Private Use, Last>;Co;0;L;;;;;N;;;;;

If it would be possible to loop over names instead of loop over all
characters to check for their names, then this code would be more fast,
but I don't see how it would be possible to loop over all defined names
in UnicodeData.txt.

If this is not possible then we could optimize the loop over all
characters in the chartable to skip these useless ranges.

Juri Linkov

reply via email to

[Prev in Thread] Current Thread [Next in Thread]