[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: idn.el and confusables.txt
From: |
Eli Zaretskii |
Subject: |
Re: idn.el and confusables.txt |
Date: |
Sun, 15 May 2011 01:56:02 -0400 |
> From: Ted Zlatanov <address@hidden>
> Date: Sat, 14 May 2011 20:22:44 -0500
>
> EZ> Should it be a list or a string? How would you use this mapping?
>
> It could be any type of sequence, I guess. Strings are more compact but
> for small amounts of data (typically 1-3 characters) I'm not sure if
> that matters. For 1 character in particular I'm pretty sure it's more
> efficient to store the character directly than any sequence.
>
> markchars.el would use it as follows: look at all the characters of a
> word. If any are of a different script S2 from the majority script S1,
> highlight them (we do this now with `markchars-face-confusable').
>
> New functionality: now if any of the S2 characters are multi-script
> confusables that map to a character in the majority script S1, highlight
> them specially with the new variable
> `markchars-face-confusable-multi-script' and give them a tooltip to say
> they are confusable with a particular character.
>
> New functionality: if any of the word characters, regardless of script,
> are confusables of the single-script type, highlight them with
> `markchars-face-confusable'. But see below about normalization.
These all examine portions of a buffer ("words") for being a match to
some string or regexp. So I think having strings in the char-table
will be more convenient, because you could then use looking-at,
string=, string-match, etc.
> As a general rule I'd say that if the mapping is to a single character
> with the SL/SA single-script property, chances are it's a true
> confusable. Otherwise it could be legitimate and we'd need to convert
> the string to a normalized form, which is probably slow (do you know?)
What do you mean by "normalized form"?
> Based on all this, I think it's best to make the confusables char-table
> values atoms or sequences (strings or lists) but split them into two
> char-tables for the single-script and multi-script mappings.
If we were to implement the full IDNA protocol, would the above be
enough? Or will we need additional information?
- Re: idn.el and confusables.txt, (continued)
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
- Re: idn.el and confusables.txt, Lennart Borgman, 2011/05/14
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
- Re: idn.el and confusables.txt,
Eli Zaretskii <=
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/15
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/16
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/16
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/17
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/17
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/18
- Re: idn.el and confusables.txt, Stefan Monnier, 2011/05/14
Re: idn.el and confusables.txt, Kenichi Handa, 2011/05/15