bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#51733: 27.1; Detect impossible email addresses better


From: Eli Zaretskii
Subject: bug#51733: 27.1; Detect impossible email addresses better
Date: Sun, 16 Jan 2022 20:14:08 +0200

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: 51733@debbugs.gnu.org,  jidanni@jidanni.org
> Date: Sun, 16 Jan 2022 18:03:23 +0100
> 
> https://www.unicode.org/reports/tr24/tr24-32.html#Scripts_and_Blocks
> 
>    As a result, using the block names as simplistic substitute for
>    script identity generally leads to poor results.
> 
> It looks like we're doing that, though?

No, not really.  We collect various blocks of the same scripts
together.

> And indeed:
> 
> (elt char-script-table #xAB65)
> => latin
> 
> which is wrong, because that's
> 
> GREEK LETTER SMALL CAPITAL OMEGA
> 
> So we should be populating char-script-table from
> http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt instead of
> Blocks.txt.  So I'll be doing that, too.

Beware: the Unicode Script property is not identical to ours!  Before
throwing away what we have, please consider how many deviations we
have in practice, and if they are just a few, let's fix only them
individually.  It's easy.  You will have to add some manual heuristics
even if you do use the Unicode Scripts.txt as the basis.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]