[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: idn.el and confusables.txt

From: Ted Zlatanov
Subject: Re: idn.el and confusables.txt
Date: Sat, 14 May 2011 12:06:04 -0500
User-agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.50 (gnu/linux)

On Sat, 14 May 2011 19:42:39 +0300 Eli Zaretskii <address@hidden> wrote: 

EZ> Isn't it better to design the table for efficient use to begin with?

Yes, and I ask you and the other experts on char-tables to help with
that design.  I am far from an expert on that topic.

>> But I don't know if markchars.el needs to be terribly fast.

EZ> I hope we are not introducing another character property for a
EZ> single use.  Some use, some day might need to do it fast.

This is premature optimization.  I only have a single use in hand.
Let's make sure markchars.el is fast and we can optimize for other uses
when they are needed.

>> Two char-tables would be enough: one small table for the confusable ->
>> target mapping, and one even smaller for the reverse target ->
>> (confusable list) mapping.  The reverse lookup table could be stored in
>> an extra slot of the primary lookup table.

EZ> Doesn't confusables.txt include both mappings already?  If so, you
EZ> don't need the reverse table.

I thought the lookups would be faster with a reverse mapping in one of
the scenarios you listed (looking up all the characters that might be
confused with a given one).  But I realized it doesn't need to be.
Let's say C1, C2, and C3 are confusables mapped to C1.  Then the mapping
is C1 -> (C2, C3); C2 -> C1; and C3 -> C1.

The algorithm is "if a character maps to an atom it's confusable with
it, if it maps to a list the whole lisp is confusable to this
character."  So to find all the confusables mapped to a character you
need at most two lookups.

In addition to the character mapping we also need a confusable data
type, which can be SL/SA (single-script) or ML/MA (mixed-script).  I
don't know where to store that.  Maybe we can just have two char-tables
for the two data types.  There aren't going to be more data types
AFAIK.  But markchars.el can definitely use the knowledge that the
confusable is within a single script or not.

Does all of that make sense?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]