[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: idn.el and confusables.txt

From: Ted Zlatanov
Subject: Re: idn.el and confusables.txt
Date: Sun, 15 May 2011 07:14:47 -0500
User-agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.50 (gnu/linux)

On Sun, 15 May 2011 01:56:02 -0400 Eli Zaretskii <address@hidden> wrote: 

EZ> These all examine portions of a buffer ("words") for being a match to
EZ> some string or regexp.  So I think having strings in the char-table
EZ> will be more convenient, because you could then use looking-at,
EZ> string=, string-match, etc.

Oh, good point.  OK, strings it is.  I'll write the converter.

>> As a general rule I'd say that if the mapping is to a single character
>> with the SL/SA single-script property, chances are it's a true
>> confusable.  Otherwise it could be legitimate and we'd need to convert
>> the string to a normalized form, which is probably slow (do you know?)

EZ> What do you mean by "normalized form"?

Unicode has a normalization algorithm to see if two strings are
informationally the same regardless of the combining characters and
other sequences within.  But thinking about it, even if normalization
says they're the same, it's still a potential problem for the user, so
we can skip normalization and always mark those.

>> Based on all this, I think it's best to make the confusables char-table
>> values atoms or sequences (strings or lists) but split them into two
>> char-tables for the single-script and multi-script mappings.

EZ> If we were to implement the full IDNA protocol, would the above be
EZ> enough?  Or will we need additional information?

Oh, all this has been for confusables (TR39) only.  IDNA and uni-idn.el
will have their own needs!  IIUC, Lennart used IDNA only as a character
set in markchars.el (I didn't write that functionality and he maintains
idn.el), but there are more security issues with it we may need to

IDNA is better described in http://unicode.org/reports/tr46/ and the
links at the end of that document (a whole bunch of RFCs).  I'm not
interested in implementing the IDNA code beyond supporting the current
character set detection because I don't think IDNA is popular enough,
but maybe Lennart and others want to do it.

For further possible markchars.el functionality, take a look at
http://www.unicode.org/reports/tr36/ (Unicode Security Considerations).
It talks about the confusables issues, IDNA issues, and bidi issues
among others.  It's a really good explanation of what security-related
functionality is needed from the confusables char-table and potentially
other places in Emacs.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]