[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] Bidirectional editing in Emacs -- main design decisions

From: Ehud Karni
Subject: Re: [emacs-bidi] Bidirectional editing in Emacs -- main design decisions
Date: Sat, 10 Oct 2009 16:57:59 +0200

On Fri, 09 Oct 2009 23:18:00 Eli Zaretskii wrote:
> Here's what I can tell about the subject (bidi display) at this point

In general I agree with your decisions.

> 1. Text storage
>    Bidirectional text in Emacs buffers and strings is stored in strict
>    logical order (a.k.a. "reading order").  This is how most (if not
>    all) other implementations handle bidirectional text.  The
>    advantage of this is that file and process I/O is trivial, as well
>    as text search.  [snip]

The search has many problems but this should not influence your bidi
reordering. The changes to various search functions can be done later.

The user ALWAYS search for the visual text s/he sees (S/he never knows
the logical order unless she visits the file literally).

The problems are caused by many reasons:
  1. Different logical inputs, even without formatting characters, can
     result in the same visual output.
     e.g. Logical Hebrew text + a number in LTR reading order, the
     number may be before or after the Hebrew text, but in the visual
     output the number will always be after (to the left of) the text.
     Logical "123 HEBREW 456" appears as "123 456 WERBEH".
  2. Formatting characters are not seen and should not be searched.
  3. The visual appearance of the searched string may be different from
     what it will match.  e.g. The search for logical "HEBREW 3." in
     RTL reading order will appear as ".3 WERBEH" but will match
     also something like logical "HEBREW 3.14159" which its visual
     appearance is "3.14159 WERBEH". This may be what the user wants
     but it may also disturb her because she really wants to find only
     (visual) ".3 WERBEH".
     There is also a technical question, how Emacs will show the found
     string which is not connected as in the "3.14159 WERBEH" above.

As a minimum adjustment, I think the search must ignore the formatting
characters. An option to show (or operate, in search & replace) only on
found matches that are also the same visually is recommended.

> 3. Bidi formatting codes are retained

Agreed, but see my comment on search.

> 7. Paragraph base direction
>    There is a buffer-specific variable `paragraph-direction' that
>    allows to override this dynamic detection of the direction of each
>    paragraph, and force a certain base direction on all paragraphs in
>    the buffer.  I expect, for example, each major mode for a
>    programming language to force the left-to-right paragraph
>    direction, because programming languages are written left to right,
>    and right-to-left scripts appear in such buffers only in strings
>    embedded in the program or in comments.

I think a better name is `bidi-paragraphs-direction' or even
`bidi-paragraphs-reading-direction'. Note the `s' in paragraphs,
because it is influence all the paragraphs in the buffer.

There should be a key to toggle this variable. It will very
useful for the minibuffer.

> 8. User control of visual order

Do you intend to support all the explicit formatting characters (LRO is
specially important as it allows to store visual strings as is) or just
the implicit (and more used) LRM and RLM ?

>    This design kills two birds: (a) it produces text that is compliant
>    with other applications, and will display the same as in Emacs, and
>    (b) it avoids the need to invent yet another Emacs infrastructure
>    feature to keep information such as paragraph direction outside of
>    the text itself.

While you can store the LRM and RLM in ISO-8859-8 encoding, there is no
way to store the the other formatting characters.

> That is all for now.  If you have comments or questions, you are
> welcome to voice them.

I found an editor that support the all the formatting characters, YODIT
(http://www.yudit.org/) it is GPLed, may be you can use it.

The W3C recommend not to use explicit formatting characters (i.e.
RLO/LRO/RLE/LRE/PDF) and instead to use markup (see
http://www.w3.org/International/questions/qa-bidi-controls ,
specially the "reasons" section).


 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry

reply via email to

[Prev in Thread] Current Thread [Next in Thread]