[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improving bidi documents display

From: Michael Welsh Duggan
Subject: Re: improving bidi documents display
Date: Sun, 27 Feb 2011 05:01:25 -0500
User-agent: Gnus/5.110014 (No Gnus v0.14) Emacs/24.0.50 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>> Date: Thu, 24 Feb 2011 14:32:35 +0200
>> From: Eli Osherovich <address@hidden>
>> At the moment (using rev. 103371)  I can edit Hebrew/English LaTeX
>> documents, however, the way they are displayed in Emacs is not perfect.
>> Please look at the file attached as you can see any English text that
>> appears inside a Hebrew paragraph requires certain decorations around it
>> (e.g., \L{some English text}) these decorations are displayed in an ugly
>> fashion.
> Yes, it's a known problem.  The Unicode UAX#9 Bidirectional algorithm
> (which is what Emacs implements for bidirectional display) does not
> produce good results with LaTeX (and with other kinds of markup).
>> Is there anything that can be done about it?
> Something _should_ be done, for sure.  But for that, Someone™ should
> figure out how this kind of problems could be solved using Emacs
> display features.  Any solution will probably involve reordering only
> parts of text, but a more detailed design suggestion is needed before
> it can be implemented.  People are welcome to try to tackle this,
> because I'm still busy with low-level bidi support of plain text.

I'd like to talk about this problem a little, just to get a little
understanding of the problem space.  Please be warned that although I
have read through UAX#9 a few times, and have been following (as best I
can) Eli's bidi work, I am still very much a novice, and am apt to make
improper assumptions, or misunderstand how things are supposed to work.

In the examples, below, I will use the convention in the UAX#9
document that a capital letter represents an R type character, and a
lower-case letter represents an L type character.  Formatting codes will
be typed as <RLE>, <PDF>, etc.

So, the example being used was:

Memory:  HEBREW \foo{english}
Levels:  11111111222222222221
Display: {foo{english\ WERBEH

Here the paragraph embedding level is 1 (odd, LtR) since the first
character is an R character.  The backslash, braces, and spaces are N
characters.  The N character sequence " \" takes on the current
embedding direction (1) based on rule N2.  The open brace gets level 2
based on rule N1, and the close brace gets level 1 again based on rule
N2.  Note that the close brace appears as its mirrored glyph due to rule

(Rule N1 states that runs of neutral characters between strong
characters of the same direction take on that direction.  Rule N2 states
that otherwise, they get the embedding direction.)

Here is another example:

Memory:  HEBREW \foo{HEBREW}
Levels:  1111111122211111111
Display: {WERBEH}foo\ WERBEH

In this case, note that both of the braces are mirrored in the display.

One simple, naive way of handling this for the various TeXs is to
consider all backslashes and brace characters as R characters.  This can
be simulated by surrounding each run of these characters by LRE PDF
pairs.  However, unless TeX ignores these characters completely, these
formatting characters would have to be removed before being processed by

Another way of handling this would be to redefine the backslash and
brace characters as R characters, for purposes of the display engine.
Currently, I don't know if there is a way to do this in elisp.  bidi.c
seems to use a character table named bidi_type_table to hold this
information.  Currently this table is not exposed at the elisp layer, to
the best of my knowledge.  Maybe it would be possible to modify this
table in elisp, and possibly make it buffer local?

Another idea would be to allow a text property to override the character
type.  This feels like a very elegant, emacs-ish way to do things, but
an uneducated glance at the bidi code makes me feel like it would be
difficult to get information about text properties into this layer.
Another idea would be to use display strings including the LRE and PDF
characters to replace existing backslashes and braces.  However, display
strings do not affect the bidi algorithm at this point.

I'm really starting to ramble at this point, so I think I will send
these musings to see what Eli and others think.

Michael Welsh Duggan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]