[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidirectional text and URLs

From: Eli Zaretskii
Subject: Re: Bidirectional text and URLs
Date: Mon, 08 Dec 2014 17:46:49 +0200

> Date: Sun, 07 Dec 2014 19:26:33 -0500
> From: Richard Stallman <address@hidden>
> CC: address@hidden, address@hidden
>  > > If so, the questionis: once you detect the strangeness, what then?
>   > It's up to the application.
> Alas, that's ducking the issue.  We need to confront this issue.

We _are_ confronting it.  We are methodically analyzing the issue
piecemeal, identifying the separate parts of it, and providing
solutions to each part as soon as it is well-defined and understood.

The problem we are dealing with is a very complex one.  It involves
multiple disciplines: bidi reordering, URL construction and display,
Internet security, cultural differences, human perception of visual
cues, etc.  Part of the solution should be in the infrastructure and
primitives, part on the application and UI level.  Moreover, we are in
uncharted territory, with no prior art or standards to guide us.
Plus, we don't have any single individual on board who'd have a good
understanding of all the aspects of the problem.

When dealing with such hard issues, it is IME methodologically wrong
to charge ahead without a sufficiently clear definition and
understanding of each part of the problem and the alternatives for
their solutions.

We have now identified the first part: how to find the potentially
fraudulent URL, and we have a clear understanding of it.  We have a
solution for that part of the problem that seems to satisfy the
requirements of the application programmer who brought up this issue.

The next step should be for the application to try using this
infrastructure to address the issue on the application and UI levels.
It is possible that that such an attempt will result in feedback that
will require changes in the infrastructure, or some additional
functionality there.  Or the application developers will decide that
this part of the problem is successfully solved, and will request
assistance in solving the next part, which will need to be defined in
clear terms.

And so on and so forth -- we will break this complex issue into
individual parts and solve them one by one on the level each part
belongs to.  That's not "ducking the issue" in my book.

What you seem to expect is that we start coding solutions to problems
that are at best very vaguely defined, without any practical
experience to back that up, guided only by some intuition.  IME, this
is a recipe for wrong solutions and for waste of time and energy.  I
submit that there's no one around here, including myself, whose
intuition in this matter I would trust, because intuition is only
reliable when it is based on knowledge and experience in the subject
matter, and we don't have such individuals at our disposal.

So I don't see any reasons to rush into coding under the

>   > That's easy: copy the text without the directional override and
>   > display it in some other buffer.  The position returned by
>   > bidi-find-overridden-directionality is of the 1st character following
>   > the override control, so copying the text starting at that position
>   > will exclude the override and avoid its effects.
> That is the first magic bidi char, but there could be more.

Inside the URL?  Extremely unlikely, see below.  In any case, the
presented use case didn't have them.  I'd like to see a complete
solution for this simple use case, before we move to more complex ones
(if they exist).

> It would be necessary to remove them all.

I don't think it's a problem, not a likely one anyway.  But if it is,
it should be almost trivial to use that primitive iteratively to
reconstruct the string with all the overrides removed.

> However, is simply removing them correct?

Yes, I think so.

> In general, do magic bidi characters get include in the URL that is
> passed to the browser?  I would expect so.

Using the directional control characters as part of the URL is
forbidden by the relevant standards.  The authorities that approve
domain names will reject them if they include such characters.  So I
think URLs which include them will be non-existent, or at least very
rare.  The use case which started this thread of discussion had the
control characters outside the URL itself, even outside the protocol
part of it.

> If so, a string which does not include them is inaccurate, and the
> accurate thing to do is to include them and display them (perhaps in
> hex) while suppressing their bidi effect.

Removing them and suppressing their effect give rise to the same
visual appearance, since these controls display as very thin spaces,
and thus are almost invisible on the screen.  That's why this type of
fraud came into existence in the first place.

As for using hex, that was one alternative I suggested earlier in this
thread.  It is still on the table, and doesn't require any
infrastructure changes to do its job.  But people liked this proposal
less, so eventually I coded the primitive to find the spoofed
characters as a means for supporting other solutions.

> Also, don't some RTL characters cause some normally LTR characters to
> display RTL?

No.  LTR characters always display left to right, unless overridden by
the RLO control (which simply makes every character act as an RTL

reply via email to

[Prev in Thread] Current Thread [Next in Thread]