[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode confusables and reordering characters considered harmful

From: Eli Zaretskii
Subject: Re: Unicode confusables and reordering characters considered harmful
Date: Thu, 04 Nov 2021 10:21:12 +0200

> From: Reini Urban <reini.urban@gmail.com>
> Date: Thu, 4 Nov 2021 08:50:14 +0100
> Cc: emacs-devel@gnu.org
>      int hi = 5;
>      int שָׁלוֹם = hi;
>      int hello = 10;
>      int السّلامعليك = hello;
>      myfun(שָׁלוֹם ,السّلامعليكم)
>  IMO this code is fundamentally valid: we should allow
>  programmers to write identifiers in their native tongue.
> Sure, nobody wants to forbid unicode identifiers. The rules only ensure that 
> identifiers keep identifiable. 
> I converted itto perl (because I dislike java or rust), and ran it through 
> cperl.
> The problem is that from an innocent look or code review you won't see any 
> problem, hence the security
> risk.
> You need to adjust your tools.
> But the very first RTL identifier שָׁלוֹם contains already non-identifier 
> characters.

Which of its characters are non-identifier, and why?  That identifier
uses characters of a single script, AFAICT.

> So I cannot tell you if this code doesn't violate any of the 4 unicode mixed 
> script profiles
> (http://www.unicode.org/reports/tr39/#Mixed_Script_Detection 2-5)
> Or if any of the unreadable characters are of the recommended scripts:

Which characters in that fragment are "unreadable" for this purpose?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]