|
From: | Gregory Heytings |
Subject: | Re: Unicode confusables and reordering characters considered harmful, a simple solution |
Date: | Thu, 04 Nov 2021 17:04:36 +0000 |
But they don't. Not more than just using RTL characters within LTR text would. Just revisit the example posted by Stefan (which I slightly modified to be more realistic):myfun("שָׁלוֹם" ,"السّلامعليكم");Which string does this function call pass as the first argument, and which as the second one?There is no danger in that example, and in particular nothing invisible.Ha-ha, very funny.
It wasn't supposed to be funny.
The programmer must just be aware that compilers read source code files in byte order, which might be different from the order in which the string is displayed on screen, but is identical to the order in which one forward-char's through the string.If we are going to assume users forward-char through every piece of code they look at, then the examples we were discussing present no problem, either.
I'm not assuming any of this. There are programmers who read Hebrew and Arabic, and those who don't. Those who do know them know that they are entered and read RTL, and don't even need to check the argument order. Those who don't may not know this, and can easily check if they have some doubt about what string is passed in which argument.
There is a danger when, because the source code contains invisible control characters, the programmer sees something on their screen, and the compiler sees something completely different.That's exactly what happens in the above example. Except that reordering happens automatically without any invisible characters, i.e. also "invisibly".
There are no invisible characters doing weird things with the text, no. And it's those invisible characters that the "Trojan Source" paper is about. Not potential interpretation problems by those who would discover RTL languages.
[Prev in Thread] | Current Thread | [Next in Thread] |