[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bidirectional text and URLs

From: Ted Zlatanov
Subject: Re: Bidirectional text and URLs
Date: Sat, 29 Nov 2014 12:14:38 -0500
User-agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/25.0.50 (gnu/linux)

On Sat, 29 Nov 2014 10:22:45 +0200 Eli Zaretskii <address@hidden> wrote: 

EZ> Once we decide which cases we want to avoid or flag, we could be smart
EZ> there, by comparing the original and reordered strings, perhaps aided
EZ> by some dictionary lookup.  The infrastructure is either already there
EZ> or easy to add.  It's "just" a matter of deciding what to do and when.

EZ> Someone(TM) should present a list of well-thought requirements, and we
EZ> can take it from there.

Well, here are the pieces I think will be useful for SHR and EWW. I
don't claim they are well-thought :)

Items 1-3 could be used through font-lock and just set some special text
properties in the buffer in text modes that request it (so this will be
an optional piece that is always available). Then themes and packages
can add special highlighting or handling for those properties.

1) bring uni-confusables in the core. In regular expressions, support
either a new syntax char class \s~ to mean "confusable" or a new
character class [:confusable:] (or some other way to easily search for
such characters, especially if they used outside of their
native script).  Possible text property: 'uni-confusable

2) in regular expressions, support a new character class [:unicodemeta:]
for any characters that have meta meaning in Unicode and no printable
representation, from bidi markers to composition. I'm not sure if that's
already possible. That will allow packages to detect these characters in
places where they are not expected, e.g. inside URL buttons. Possible
text property: 'uni-meta

3) make it easy in the core to scan the buffer for places where scripts
are mixed in a single sentence, string, word, symbol, etc. syntactic
unit. markchars.el does that but only inside words. Possible text
property: 'uni-mixedscripts

4) modify `browse-url' to intercept suspicious URLs where any of the
above happened in the source buffer. I think the calling package will
have to help set the context. I don't know if it can be automated...
maybe the function could look for those special text properties around
point in the buffer where it was invoked?

5) modify SHR/EWW to highlight these text properties and interrupt the
user when the text or content of the URL button has them.

Does that seem useful?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]