bug#51733: 27.1; Detect impossible email addresses better

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#51733: 27.1; Detect impossible email addresses better

From:	Lars Ingebrigtsen
Subject:	bug#51733: 27.1; Detect impossible email addresses better
Date:	Wed, 19 Jan 2022 14:55:35 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

> I think we should first determine what kinds of applications may need
> this, and take it from there.  The initial number of "confusability
> with" classes can be very small, and we can add more as we discover
> interesting use cases.  The full number is pretty much infinite, I
> think, but I'm not sure Emacs needs  to support all of them OOTB.  We
> could support some of the popular ones, and provide infrastructure for
> developing more.

Yes.

I was thinking about this bit, which isn't implemented yet (although the
utility functions for it basically are).

----
The process of determining suspect usage of whole-script confusables is more 
complicated than simply looking at the scripts of the labels in a domain name. 
For example, it can be perfectly legitimate to have scripts in a SLD (second 
level domain) not be the same as scripts in a TLD (top-level domain), such as:

    Cyrillic labels in a domain name with a TLD of .ru or .рф
    Chinese labels in a domain name with a TLD of .com.au or .com
    Cyrillic labels that aren’t confusable with Latin with a TLD of .com.au or 
.com

The following high-level algorithm can be used to determine all scripts that 
contain a whole-script confusable with a string X:

    Consider Q, the set of all strings confusable with X.
    Remove all strings from Q whose resolved script set is ∅ or ALL (that is, 
keep only single-script strings plus those with characters only in Common).
    Take the union of the resolved script sets of all strings remaining in Q.

As usual, this algorithm is intended only as a definition;
implementations should use an optimized routine that produces the same
result.
----

I'm not sure I understand the algorithm they're proposing.  I think this
shouldn't be suspicious?  But I may be wrong:

(textsec-domain-suspicious-p "Сгсе.рф")
=> nil

But this should be, but isn't currently:

(textsec-domain-suspicious-p "Сгсе.ru")
=> nil

Now, 

(textsec-ascii-confusable-p "Сгсе.ru")
=> t

and

(textsec-ascii-confusable-p "Сгсе.рф")
=> nil

Is that what they mean here?  I'm finding the logic overly clear here.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

[Prev in Thread]

Current Thread

[Next in Thread]

bug#51733: 27.1; Detect impossible email addresses better, (continued)

Prev by Date: bug#51733: 27.1; Detect impossible email addresses better
Next by Date: bug#51733: 27.1; Detect impossible email addresses better
Previous by thread: bug#51733: 27.1; Detect impossible email addresses better
Next by thread: bug#51733: 27.1; Detect impossible email addresses better
Index(es):
- Date
- Thread