[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidirectional text and URLs

From: Stephen J. Turnbull
Subject: Re: Bidirectional text and URLs
Date: Sat, 29 Nov 2014 15:09:02 +0900

Eli Zaretskii writes:

 > Not really, not in this particular field.
 > > but I would say that given that the UAX#9 bidi algorithm does what's
 > > wanted 99.44% of the time, it makes sense to mark text reordered by
 > > RTL markers with a warning face
 > That might be considered an annoyance by users of bidi scripts.
 > There's any number of perfectly valid URLs that use the same
 > formatting control characters.

Why?  Because many displays don't implement UAX#9?  Or is it because
UAX#9 defines segments in a way that would reorder the components of a
domain name or path?  That is, the logical URL


is expected by a bidi reader to appear as


but UAX#9 would display it as


(the natural direction of lowercase characters is LTR, the natural
direction of uppercase characters is RTL)?  (Or perhaps the reverse

Whatever the reason, I'd have to say that's too bad for users of bidi
languages, because that means *any* bidi URLs is ambiguous, and
therefore subject to being deliberately obfuscated by reflection
and/or jumbling, regardless of the presence of directional controls.

 > What you suggest might be TRT when left-to-right text is enclosed
 > within directional override controls (which is what Lars did in his
 > example).  These controls assign right-to-left directionality to all
 > the enclosed characters, which is indeed highly suspicious in URLs.

This isn't hard to detect.  But there is also the case where you have
a word which is a different word when reflected.  I assume that this
is the case in bidi languages as well, and of course any jumble is
possible as a domain or path component which is an abbreviation.  And
any useful jumble can probably be registered as a domain, and
certainly incorporated in a path.

 > In addition to using a special face, another possibility is to present
 > the directional overrides in these cases in percent-hex notation,
 > which will disable their effect on the enclosed text.  Of course, this
 > should be only done when the enclosed text is entirely made of LTR
 > characters and neutrals.

Well, no.  I assume that bidi readers are as vulnerable to phishing
and other frauds as non-bidi readers (hard as that may be to believe
for you bidi readers).  That is not yet clear.

 > > You do need a way to turn it off, or to make it reasonably smart, in
 > > the case of ASCII which is often mixed with other charsets.
 > Not sure what you mean here.

As above, where the domain name is ASCII and the path is RTL.  Or the
path (or the domain) might be mixed.

 > "Turn off" how?

"We need to decide what we want to do, and then look for a mechanism."

 > And how do you do that without unduly punishing perfectly valid
 > URLs that need these controls to avoid visual "jumbles"?

I hate to tell you, but the phishers have *already* started punishing
those perfectly valid URLs.  You have a choice of punishment, that's
all: "jumbled display" vs. "defrauded users".

Except that as I say above, apparently all bidi URLs must now be
considered to offer suspicious display under some circumstances, so
maybe you have no choice about the defrauded users.  In that case I
suppose avoiding jumbles does take precedence.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]