[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidirectional text and URLs

From: Eli Zaretskii
Subject: Re: Bidirectional text and URLs
Date: Mon, 01 Dec 2014 19:39:35 +0200

> From: Lars Magne Ingebrigtsen <address@hidden>
> Cc: address@hidden
> Date: Mon, 01 Dec 2014 17:19:30 +0100
> Eli Zaretskii <address@hidden> writes:
> > Anyway, if you want this, please show the API of the function -- what
> > it should return and how.
> Actually, I'm not sure.  :-) Would it make any sense to have a function
> like `(displayed-directionality POSITION)' that returns either
> `right-to-left' or `left-to-right?  If so, the URL-finding function
> would query about the start of the URL (which would normally be the HTTP
> part), and if that's `right-to-left', Here There Be Shenanigans.

How is this different from the previous suggestion?

> >> Yes, I want to unspoof the URL.  Adding some markings to notify that
> >> this has been done would also be nice, perhaps by adding a 'warning face
> >> to the text or the like.
> >
> > Then putting a display property on the offending RLO might be the best
> > solution.
> On the RLO character itself or the URL affected by the RLO?

On the RLO.  The URL will be left intact, and will show correctly
after you put the display property.

> >> And displaying ‮http://myspace.com/#/segami/moc.koobecaf//:sptth‬ with a
> >> couple of visible control characters doesn't really solve the problem,
> >> because most people will still assume that that's a link to Facebook,
> >> not to Myspace.  Most people are not even aware that this bidi stuff
> >> exists.
> >
> > Under my suggestion to cover the overrides with a display property,
> > the URL will not be reversed on display.  Did you try that?
> Oh, they won't?  I thought you meant adding a display property to the
> RLO in addition to having it do what it normally does.

Any character covered by a display property effectively loses its bidi
properties, as described by this paragraph in the ELisp manual:

     Text covered by `display' text properties, by overlays with
  `display' properties whose value is a string, and by any other
  properties that replace buffer text, is treated as a single unit when
  it is reordered for display.  That is, the entire chunk of text covered
  by these properties is reordered together.  Moreover, the bidirectional
  properties of the characters in such a chunk of text are ignored, and
  Emacs reorders them as if they were replaced with a single character
  `U+FFFC', known as the "Object Replacement Character".  This means that
  placing a display property over a portion of text may change the way
  that the surrounding text is reordered for display.  To prevent this
  unexpected effect, always place such properties on text whose
  directionality is identical with text that surrounds it.

> So is your suggestion here to disable all RLO (etc.) characters in mail
> buffers?

No, only RLOs that affect URLs.

Specifically, I suggest to look for RLO before a URL on the same
physical line, and PDF or hard newline after it, and if found, cover
it by a display property whose value is e.g. a string " ".  Since just
the fact that you find an RLO before doesn't yet mean that it's a
malicious RLO (other bidirectional controls which you don't want to
know about can countermand the RLO before it affects the URL display),
I suggest to augment that by checking that the URL's host and domain
parts consist of LTR characters whose directionality was overridden.
The latter part is to be done by calling a new primitive mentioned

Given all this evidence, I think it's pretty much certain that we
found our offending RLO.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]