Re: idn.el and confusables.txt

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: idn.el and confusables.txt

From:	Eli Zaretskii
Subject:	Re: idn.el and confusables.txt
Date:	Sun, 15 May 2011 01:56:02 -0400

> From: Ted Zlatanov <address@hidden>
> Date: Sat, 14 May 2011 20:22:44 -0500
> 
> EZ> Should it be a list or a string?  How would you use this mapping?
> 
> It could be any type of sequence, I guess.  Strings are more compact but
> for small amounts of data (typically 1-3 characters) I'm not sure if
> that matters.  For 1 character in particular I'm pretty sure it's more
> efficient to store the character directly than any sequence.
> 
> markchars.el would use it as follows: look at all the characters of a
> word.  If any are of a different script S2 from the majority script S1,
> highlight them (we do this now with `markchars-face-confusable').
> 
> New functionality: now if any of the S2 characters are multi-script
> confusables that map to a character in the majority script S1, highlight
> them specially with the new variable
> `markchars-face-confusable-multi-script' and give them a tooltip to say
> they are confusable with a particular character.
> 
> New functionality: if any of the word characters, regardless of script,
> are confusables of the single-script type, highlight them with
> `markchars-face-confusable'.  But see below about normalization.

These all examine portions of a buffer ("words") for being a match to
some string or regexp.  So I think having strings in the char-table
will be more convenient, because you could then use looking-at,
string=, string-match, etc.

> As a general rule I'd say that if the mapping is to a single character
> with the SL/SA single-script property, chances are it's a true
> confusable.  Otherwise it could be legitimate and we'd need to convert
> the string to a normalized form, which is probably slow (do you know?)

What do you mean by "normalized form"?

> Based on all this, I think it's best to make the confusables char-table
> values atoms or sequences (strings or lists) but split them into two
> char-tables for the single-script and multi-script mappings.

If we were to implement the full IDNA protocol, would the above be
enough?  Or will we need additional information?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: idn.el and confusables.txt, (continued)
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
  - Re: idn.el and confusables.txt, Lennart Borgman, 2011/05/14
    - Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
  - Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
    - Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
    - Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
    - Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
    - Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
    - Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
    - Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
    - Re: idn.el and confusables.txt, Eli Zaretskii <=
    - Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/15
    - Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/16
    - Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/16
    - Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/17
    - Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/17
    - Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/18
    - Re: idn.el and confusables.txt, Stefan Monnier, 2011/05/14
  - Re: idn.el and confusables.txt, Kenichi Handa, 2011/05/15
    - Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/15
    - Re: idn.el and confusables.txt, handa, 2011/05/18

Prev by Date: Re: gnu.emacs.help low traffic
Next by Date: Re: gnu.emacs.help low traffic
Previous by thread: Re: idn.el and confusables.txt
Next by thread: Re: idn.el and confusables.txt
Index(es):
- Date
- Thread