[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: idn.el and confusables.txt

From: Ted Zlatanov
Subject: Re: idn.el and confusables.txt
Date: Sat, 14 May 2011 10:30:37 -0500
User-agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.50 (gnu/linux)

On Sat, 14 May 2011 17:38:11 +0300 Eli Zaretskii <address@hidden> wrote: 

>> From: Ted Zlatanov <address@hidden>
>> Date: Sat, 14 May 2011 08:40:48 -0500
EZ> You see, the uni-*.el files we create out of the Unicode DB are not
EZ> used anywhere in application code, AFAIK.  We use them to display
EZ> character properties in the likes of "C-u C-x =", and that's it.  I'm
EZ> not even sure they are organized in a way that makes them useful.
>> markchars.el could use other Unicode properties if people ask.

EZ> I'm talking about the details.  The way we currently set the tables in
EZ> uni-*.el is that many of the values are symbols.  For example:

EZ>   (get-char-code-property ?1 'general-category) => Nd
EZ>   (get-char-code-property ?א 'bidi-class) => R
EZ>   (get-char-code-property ?\( 'mirrored) => Y

EZ> The `Nd', `R', and `Y' are symbols.

EZ> Now, suppose you wanted to use these values in some code that needs to
EZ> be fast -- how would you feel about having to write multi-branch
EZ> `cond' forms to compare the value against all the possibilities?

It wouldn't be ideal, surely, but most glyphs are not confusable so the
lookup would fail.  I might write some of it in C if performance was an
issue, or try to inline the conditions with macros, or cache the
lookups.  But I don't know if markchars.el needs to be terribly fast.
It runs at the font-lock level and IIUC that's opportunistic and not
time-critical like the display code.  For instance, unmodified text is
not rechecked, right?

EZ> For now, with markchars.el, all you need is a boolean value for each
EZ> character.  However, in other use cases, some other Lisp code will
EZ> want the paired character.  Yet another application will want to
EZ> compare characters such that confusable pairs will compare equal.  Can
EZ> a single table satisfy all these needs efficiently?  Maybe it can, but
EZ> we need to design that table carefully.

Two char-tables would be enough: one small table for the confusable ->
target mapping, and one even smaller for the reverse target ->
(confusable list) mapping.  The reverse lookup table could be stored in
an extra slot of the primary lookup table.

markchars.el could use this mapping to show more information than just
underlining the characters.  A tooltip could show why the glyph is
confusable, for instance.

>> Also the char-table doesn't have to
>> cover the Asian confusables--I'm not sure anyone would need those.

EZ> Well, the Unicode consortium definitely thought they were needed.
EZ> Either we follow established standards, or we don't.

You're right.  Also there are Asian characters that could be confused
for Latin characters so it's not safe to exclude them.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]