emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bidi-display-reordering is now non-nil by default


From: Eli Zaretskii
Subject: Re: bidi-display-reordering is now non-nil by default
Date: Thu, 04 Aug 2011 01:16:15 -0400

> From: "Stephen J. Turnbull" <address@hidden>
> Cc: Lars Magne Ingebrigtsen <address@hidden>,
>     address@hidden,
>     address@hidden
> Date: Thu, 04 Aug 2011 12:23:28 +0900
> 
> Eli, you seem to forget that *Unicode is a wire protocol*, an inter-
> application communication tool.  It is not intended to be a
> specification, or even recommendation, of how applications handle text
> internally.

Yes, but we are not talking about the internal handling.  We are
talking about display, which is an external, user-visible part of the
issue.  The Unicode Bidirectional Algorithm is a specification for
converting a stream of text into an array of character glyphs on the
screen.  It is not a wire protocol.  Nor is it specific to text
external to Emacs: after all, the internal storage of text in Emacs,
as in many other applications, is just a linear byte stream.

Let's go back to the issue at hand: the directional control
characters.  A quote from UAX#9:

  [...] there are circumstances where an implicit bidirectional
  ordering is not sufficient to produce comprehensible text. To deal
  with these cases, a minimal set of directional formatting codes is
  defined to control the ordering of characters when rendered. This
  allows exact control of the display ordering for legible interchange
  and ensures that plain text used for simple items like filenames or
  labels can always be correctly ordered for display.

  The directional formatting codes are used only to influence the
  display ordering of text. [...]

> Of course on writing a stream to the outside world, Emacs will need to
> use directional marks.  Surely Lars does not deny that!  However,
> internally, text properties could in theory suffice, just as they do
> for ANSI coloring.

This option (converting directional marks in external stream to some
Emacs feature on I/O) was also discussed at the time (nearly 10 years
ago).  It is possible to implement it, but it is unnecessarily
complicated, and it even has some hard-to-resolve issues.  For
example, what if the user inserts these characters manually? we will
then face a very real risk to introduce subtle bugs whereby saving the
text to a disk file, then visiting that file could produce a buffer
whose contents are different.  Such unexpected conversions behind
user's back proved to be an annoyance, as the experience of MULE
shows.

>  > Because (a) text properties are specific to Emacs, and (b) they cannot
>  > overlap (for the same property).  By contrast, to force certain visual
>  > order, one must sometimes force some direction on a portion of text
>  > and then the opposite direction on an inner substring of that very
>  > text.  Text properties won't grok that.
> 
> Huh?  Of course text properties nest.

A single character can have only one property of each type.  Let's say
we call this property `direction' and give it 2 values: L2R and R2L.
Then imagine the following scenario:

  . you mark a portion of text with L2R direction property

  . you then want to mark part of that portion with R2L direction
    property (there are situations where this is necessary, I can show
    examples if this matters, but for now please believe me)

  . since each character can have only one value of the direction
    property, you cannot do this in any simple way; you'd need to
    split the original region in 3 parts, which is ugly and
    complicates what needs to be done when text in this region is
    deleted (keep in mind that the UBA mandates support of up to 60
    levels of such embedded direction reversals, don't ask me why, and
    Emacs is in full compliance)

> If for some subtle reason, they don't quite nest correctly for this
> purpose, overlays most likely will.

Overlays don't get copied with the text, so if you copy/paste text
into another area of the same buffer or into another buffer, the nice
display will be lost.  We could complicate the heck out of yanking so
it reinserts the overlays, of course, but why complicate things if an
easier way is available that is straightforward?

>  > What is the difference between aligning HELLO and aligning a summary
>  > buffer?  They are both plain text, and they both are arranged to align
>  > nicely.
> 
> HELLO arrives as an external plain text stream, and therefore is
> governed by the Unicode standard.  The summary buffer is constructed
> by Emacs and it is not plain text

But it should be possible to copy portions of that buffer elsewhere,
and such a copy should keep its visual appearance on the screen with
minimal fuss, or else users will be annoyed.  Right?  The question
that bugged us during the early stages of the design was how do you
ensure this without asking Lisp application programmers to jump
through the hoops every time text is copied or saved or read.  It
turns out that using the directional control characters is the easiest
way.

> (it has a *lot* of structure, being mousable etc), and therefore is
> not governed by the standard for its *implementation*.

It's not governed by the standard, but following the standard is the
easiest way of achieving the goal with minimal implications.

> How many directional marks are needed in the Hebrew TUTORIAL, given
> the full BIDI algorithm implementation

Not many, but some.  About 120, if my count is correct.

> and how many are redundant?

None.  I used them only where the normal implicit reordering didn't
yield the correct display.

> Have you copied portions of the TUTORIAL with embedded marks into
> email headers and gotten appropriate results?

Yes.  It works, and works seamlessly.  That's the whole point of using
these control characters.

> I bet that, as Lars implies, Emacs is going to need
> `yank-dropping-directional-marks' in some applications.

If we drop the marks on yanking, text will look differently when
yanked, sometimes completely differently, to the degree of being
incomprehensible.  I think that way lies madness, if we want a decent
support of bidi scripts.  So such a feature would be ill-advised, and
I will do my best to convince people out of it.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]